Showing all open roles at Hays
Showing 1–2 of 2 jobs
🔔 49 candidates uploaded their CV this week
The job holder is responsible for designing, building, and operating production-grade AI agents on Microsoft Foundry — LLM-powered systems that reason, use tools, and orchestrate multi-step workflows against enterprise data and services. This is a hands-on, pro-code role: the engineer writes Python, integrates Azure services, and ships agentic applications that drive measurable business impact. The engineer works within agile teams alongside Product, Data Science, and Architecture to take agent use cases from prototype to production.Key Responsibilities:AI Agent Development on Microsoft FoundryDesign, build, and deploy pro-code AI agents on Microsoft Foundry, leveraging its agent framework, model catalogue, and tool/function-calling capabilities.Integrate agents with Palantir Foundry via MCP (Model Context Protocol) to fetch enterprise data and execute Ontology actions, alongside other tool, retrieval, memory, and multi-agent orchestration patterns in Python.Build evaluation harnesses and guardrails (groundedness, safety, cost, latency) and iterate on prompts, tools, and architectures based on measured outcomes.LLM Engineering & ProductionizationApply prompt engineering, structured output, RAG, and grounding techniques appropriate to each use case.Operate agents in production: observability, tracing, cost monitoring, regression testing, and incident response.Manage versioning of prompts, tools, and agent configurations alongside code.Azure & Platform IntegrationIntegrate with Azure services (Azure OpenAI, AI Search, Storage, Functions, Container Apps, Key Vault, Entra ID) to deliver secure, scalable agent solutions.Build and maintain CI/CD pipelines for agent deployment, including automated evals as a release gate.Implement authentication, authorization, secret management, and data-handling patterns aligned with enterprise security standards.Cross-functional CollaborationWork closely with Product Managers, Data Scientists, and IT Architects to translate agent use cases into delivery plans.Communicate trade-offs (model choice, latency, cost, accuracy) clearly to technical and non-technical stakeholders.Champion agentic patterns and reusable components across the engineering organization.Qualifications:Software engineering experience with hands-on production experience building LLM-based / agentic systems (tool use, orchestration, RAG, evals).Bachelor’s Degree in Computer Science, Engineering, or related quantitative fieldAdvanced degrees preferred.Strong Python — idiomatic, typed, tested; comfortable with async and modern packagingProven experience building production AI agents in code (e.g. Microsoft Foundry / Azure AI Foundry, Semantic Kernel, AutoGen, LangGraph, LlamaIndex, OpenAI Agents SDK, or equivalent custom frameworks)Microsoft Foundry experience strongly preferred (agent authoring, model catalog, connectors, deployment)Azure exposure — Azure OpenAI, AI Search, identity, storage, serverless compute; familiarity with Azure DevOps or GitHub ActionsDeep working knowledge of LLM application patterns: tool/function calling, structured outputs, RAG, memory, multi-agent orchestrationExperience with evaluation frameworks and observability for LLM systems (tracing, prompt/response logging, automated evals)DevOps fundamentals: Git, CI/CD, containers, infrastructure-as-code; exposure to data engineering and vector stores is a plusHigh-energy, ownership-driven mindsetStrong problem-solving and analytical skillsExcellent communication and collaboration abilitiesComfortable navigating ambiguity and driving clarityExperience in agile/start-up environments preferredTrack record of taking AI/ML or agent projects from prototype to production
We are looking for a hands‑on AI/ML Engineer to own and execute MLOps, evaluation, and deployment practices for a production AI platform built on LLMs, agentic workflows, vision, and voice AI.This role is strongly execution‑focused. You will work across the entire AI lifecycle—from evaluation and observability to RLHF, deployment in constrained environments, and production readiness sign‑off—while collaborating with internal teams and directing external vendors.Key ResponsibilitiesMLOps & Deployment OwnershipDefine and oversee MLOps practices including:Agent and model versioningEvaluation trackingDeployment gating and promotion workflowsRollback and recovery proceduresCollaborate with internal stakeholders and external delivery teams to ensure reliable production deployments.Evaluation, Monitoring & ObservabilityOwn the evaluation framework for:LLM‑based agentsRAG pipelinesVision Language Models (VLMs)Voice AI models (OpenAI Whisper, Chatterbox, Vibe Voice, or equivalent)Define and maintain:Offline evaluation methodologiesOnline monitoring and regression detection thresholdsHuman‑in‑the‑loop review processesSet up and manage AI observability tooling (e.g., Langfuse or equivalent) across all environments.Performance Reporting & InsightsBuild and maintain product performance reporting, covering:Model accuracy and agent effectivenessLatency and cost‑per‑interactionBias, quality trends, and stability across marketsProvide clear technical insights to non‑technical stakeholders.RLHF & Continuous ImprovementDesign and oversee RLHF (Reinforcement Learning from Human Feedback) pipelines:Data collection and feedback ingestionAnnotation guidelines and reward criteriaFeedback loops for continuous improvementDirect implementation by external teams and monitor quality improvements over time.Agent Memory SystemsOwn the design and validation of agent memory architectures, including:Short‑term context windowsLong‑term retrievalEpisodic memory across sessionsMemory lifecycle policies (retention, expiry, cost control)Define test criteria to ensure consistency across deployment environments.Model Benchmarking & OptimizationEvaluate and benchmark VLMs and voice models under constrained infrastructure.Recommend optimization strategies:QuantizationDistillationRuntime and model selection per jurisdictionValidate production readiness in on‑prem or sovereign environments.Production Readiness & RolloutsOversee production deployments executed by vendor teams.Run final validation checks and sign off on production readiness.Document deployment patterns, baselines, and environment‑specific configurations to accelerate future market rollouts.Privacy & Data ResidencyEvaluate and recommend privacy‑preserving deployment patterns, including:On‑device inferenceData isolationLocal or sovereign model hostingEnsure compliance with jurisdictional data residency requirements.Technical Requirements3–5 years of experience in applied AI, LLMOps, MLOps, or similar technical AI roles.Strong Python expertise:Type hints, async programming, FastAPICode reviews, evaluation scripts, prototyping pipelinesExperience with LLM application patterns:RAG pipelinesPrompt engineeringMulti‑agent orchestrationSolid background in supervised ML (scikit‑learn, XGBoost, LightGBM, or equivalent).Strong understanding of MLOps fundamentals:Model versioningExperiment trackingCI/CD deployment pipelinesMonitoring and rollback strategiesHands‑on experience with:RLHF or human‑feedback‑driven improvement loopsLLM/VLM/voice AI evaluation frameworksAgent memory architecturesWorking knowledge of:Vision Language Models (VLMs)Voice AI systems across latency, language, and hosting constraintsUnderstanding of model optimisation techniques (quantization, distillation, ONNX).Experience using AI observability tools (Langfuse, LangSmith, or equivalent).Comfortable directing or overseeing external/vendor engineering teams.Ability to work independently in ambiguous and non‑standard infrastructure environments.Good to HaveExperience with sovereign cloud or government‑regulated infrastructure.Familiarity with agentic AI frameworks (LangChain/LangGraph, CrewAI, PydanticAI).Exposure to federated learning or privacy‑preserving inference.Background in healthcare, insurance, or regulated domains.Experience building performance dashboards for non‑technical audiences.Immediate Joiner - Required