AIJobsUAE
    Find JobsCompaniesEventsResources
    Home/Jobs/Halian | Managed Services, Recruitment Agency & Contract Staffing/ML Infra Engineer (m/f/d)
    Halian | Managed Services, Recruitment Agency & Contract Staffing

    ML Infra Engineer (m/f/d)

    Halian | Managed Services, Recruitment Agency & Contract Staffing
    Abu Dhabi Emirate, United Arab Emirates Full-timePosted 20 Jun 2026
    Staffing and Recruiting

    Sign in to apply

    Create a free account to apply for ML Infra Engineer (m/f/d) and track your applications.

    One-click apply and track your application status
    Save jobs and build your shortlist
    Get alerts for new AI & ML jobs in UAE
    Job Description

    Role OverviewThis role focuses on designing and building the infrastructure that enables scalable machine learning development, from training-ready datasets through to validated models deployed in production environments.The position involves establishing core systems and architectural foundations that will support long-term scalability and performance. Key areas include training infrastructure, distributed learning frameworks, experiment management, model lifecycle management, and reliable pathways from model development to production deployment.Key ResponsibilitiesTraining infrastructure Design, deploy, and operate GPU-based training environments across cloud platforms such as AWS and GCP. This includes node provisioning, workload scheduling (e.g., Kubernetes, Slurm), multi-node networking, GPU monitoring, and cost/utilization optimization. Distributed training systems Own and optimize distributed training frameworks such as PyTorch DDP and FSDP. Implement and tune strategies including mixed precision, gradient checkpointing, activation offloading, and parallelism approaches to ensure efficient large-scale training. Training data I/O performance Develop high-throughput data loading and storage access patterns to support multi-GPU and multi-node training. Implement techniques such as data sharding, prefetching, local NVMe caching, and resumable data pipelines. Contribute to dataset format design with a focus on efficient read performance. Experiment tracking and model management Implement and maintain experiment tracking and model registry systems using platforms such as MLflow or Weights & Biases. Ensure reproducibility, traceability, and comparison of experiments through proper artifact and checkpoint management. ML CI/CD pipelines Build automated pipelines for training, evaluation, and deployment readiness. Establish validation gates, regression testing, and controlled promotion of models across different lifecycle stages. Model packaging and deployment pipelines Develop reliable CI/CD workflows for model conversion, benchmarking, and packaging using tools such as ONNX, TensorRT, SNPE, or TIDL. Ensure all artifacts are properly versioned and tracked with full lineage. Production monitoring and feedback loops Design and implement systems that capture model performance in production environments. Enable continuous feedback loops by feeding operational data back into retraining and evaluation pipelines. Required Experience5+ years of experience building and operating ML infrastructure for production-grade deep learning systems, ideally including computer vision or perception-based workloads Strong proficiency in Python, with working knowledge of C++ for inference runtimes and deployment tooling Deep hands-on experience with distributed training at scale (DDP, FSDP), with the ability to troubleshoot performance, stability, and memory issues Experience operating GPU clusters on AWS and/or GCP, including scheduling frameworks (e.g., Kubernetes, Slurm) and understanding of networking and storage trade-offs Proven experience with experiment tracking, model registries, and ML CI/CD workflows in production environments Track record of building end-to-end infrastructure supporting the full model lifecycle, from training through deployment Preferred QualificationsBackground in autonomous systems, perception, or complex real-world ML applications Familiarity with inference optimization toolchains (TensorRT, ONNX, SNPE, TIDL) and their integration into deployment pipelines Understanding of multimodal or sensor-based datasets and formats (e.g., MCAP, rosbag2) Experience working with real-time inference constraints and deployment environments requiring optimized performance ML Infra Engineer in Abu Dhabi, United Arab Emirates

    About Halian | Managed Services, Recruitment Agency & Contract Staffing

    Industry

    Staffing and Recruiting

    Application Tips

    Tailor your CV

    Highlight your most relevant AI/ML experience

    Research Halian | Managed Services, Recruitment Agency & Contract Staffing

    Check their AI products and latest news

    Show impact

    Use metrics to quantify your achievements

    Similar Jobs

    Senior Manager - Gen AI

    Hays · Abu Dhabi Emirate, United Arab Emirates

    Senior Manager - AI Engineering

    Hays · Abu Dhabi, Abu Dhabi Emirate, United Arab Emirates

    Senior Machine Learning Engineer

    cander · Abu Dhabi, Abu Dhabi Emirate, United Arab Emirates

    Chief Technology Officer

    Finance House · Abu Dhabi Emirate, United Arab Emirates

    AIJobsUAE

    Connecting AI talent with opportunities across the United Arab Emirates.

    For Candidates

    • Browse Jobs
    • Find Events
    • Application Tracker

    For Employers

    • Post Jobs
    • Company Profiles

    Support

    • Contact Us
    • Privacy Policy

    The Sunday Brief

    Your week in UAE AI — new roles, hiring trends, and one skill to learn. Free, every Sunday.

    Browse AI Jobs

    AI Jobs in DubaiMachine Learning Jobs in DubaiData Scientist Jobs in UAERemote AI Jobs in UAEAI Jobs in Abu Dhabi

    © 2026 AIJobsUAE. All rights reserved. Empowering AI careers across the Emirates.

    Powered by ArtisanAI