Talent.com
MLops Engineer

MLops Engineer

ArrayoBoston, MA, United States
4 days ago
Job type
  • Full-time
Job description

MLops Engineer (Training Scalability & Workflow Optimization)

We are seeking an MLops Engineer to lead the scaling of machine learning training pipelines and ensure the robustness and efficiency of our end-to-end ML workflows. This role focuses on leveraging Flyte , Kubernetes (GPU optimization), Docker , and distributed training frameworks such as Ray to optimize and streamline our ML infrastructure.

Overview

This role focuses on leveraging Flyte , Kubernetes (GPU optimization), Docker , and distributed training frameworks such as Ray to optimize and streamline our ML infrastructure.

Responsibilities

  • Workflow Orchestration : Develop and maintain ML workflows using Flyte to manage complex ML pipelines for training, testing, and deployment.
  • Training Scalability : Architect and scale large-scale ML training systems on GPU-backed Kubernetes clusters , including auto-scaling and performance tuning for multi-node / multi-GPU workloads.
  • Distributed Computing : Implement distributed model training pipelines using frameworks like Ray for parallelization and resource efficiency.
  • Containerization : Design, build, and optimize Docker images for ML workloads with a focus on reproducibility and security.
  • Resource Optimization : Debug and optimize GPU utilization, memory, and compute bottlenecks during training and inference phases.
  • Monitoring & Maintenance : Integrate monitoring for ML jobs, track resource consumption, and enforce cost-efficient resource utilization.
  • Collaboration : Work closely with data scientists and ML engineers to productize and scale ML experiments.

Qualifications

  • Strong proficiency with Kubernetes (GPU scheduling, Helm, cluster autoscaling).
  • Hands-on experience with Flyte or similar workflow orchestration tools (Airflow, Prefect).
  • Deep knowledge of distributed ML training (e.g., PyTorch DDP, Ray, Horovod).
  • Expertise in Docker and container lifecycle management.
  • Solid understanding of GPU hardware / software stack (CUDA, NCCL).
  • Familiarity with CI / CD for ML (MLops pipelines using tools like GitHub Actions, ArgoCD).
  • Bonus : Familiarity with observability tools for ML systems (Prometheus, Grafana).
  • #J-18808-Ljbffr

    Create a job alert for this search

    Mlops Engineer • Boston, MA, United States

    Related jobs
    • Promoted
    Solution Deployment Engineer

    Solution Deployment Engineer

    Flock SafetyBoston, MA, US
    Full-time
    Flock Safety is the leading safety technology platform, helping communities thrive by taking a proactive approach to crime prevention and security. Our hardware and software suite connects cities, l...Show moreLast updated: 30+ days ago
    • Promoted
    Senior MLOps Engineer

    Senior MLOps Engineer

    EBSCO Information ServicesBoston, MA, United States
    Full-time
    EBSCO Information Services (EBSCO) delivers a fully optimized research experience, seamlessly integrated with a powerful discovery platform to support the information needs and maximize the researc...Show moreLast updated: 4 days ago
    • Promoted
    Senior Software Engineer (ML Operations)

    Senior Software Engineer (ML Operations)

    WhoopBoston, MA, US
    Full-time
    At WHOOP, we're on a mission to unlock human performance and healthspan.WHOOP empowers members to perform at a higher level through a deeper understanding of their bodies and daily lives.We are...Show moreLast updated: 21 days ago
    • Promoted
    Principal Fuel Systems Engineer (R3300) (Remote)

    Principal Fuel Systems Engineer (R3300) (Remote)

    Shield AIBoston, MA, United States
    Remote
    Full-time +1
    Founded in 2015, Shield AI is a venture-backed deep-tech company with the mission of protecting service members and civilians with intelligent systems. Its products include the V-BAT aircraft, Hivem...Show moreLast updated: 20 days ago
    • Promoted
    Telemedicine Physician

    Telemedicine Physician

    QuickMDScituate, MA, US
    Full-time
    QuickMD is a leading telemedicine provider, delivering high-quality virtual care across 44 states.Since our founding in 2019, we have helped more than 100,000 patients access essential medical trea...Show moreLast updated: 30+ days ago
    • Promoted
    Senior ML Platform Engineer

    Senior ML Platform Engineer

    WhoopBoston, MA, US
    Full-time
    At WHOOP, we're on a mission to unlock human performance and healthspan.WHOOP empowers members to perform at a higher level through a deeper understanding of their bodies and daily lives.We are...Show moreLast updated: 16 days ago
    • Promoted
    Free CDL Training and Job Placement for a Fresh Start

    Free CDL Training and Job Placement for a Fresh Start

    EmergeCohasset, MA, US
    Full-time
    We are a government-funded job training program for people who have been impacted by the justice system (arrest, probation, parole, or incarceration), and we help them become CDL truck drivers.Free...Show moreLast updated: 1 day ago
    • Promoted
    Lead Semiconductor Reliability Engineer

    Lead Semiconductor Reliability Engineer

    RaytheonAndover, MA, US
    Full-time
    MA112 : Andover MA 358 Lowell St Dukes 358 Lowell Street Dukes, Andover, MA, 01810 USA.Person, or Immigration Status Requirements : . The ability to obtain and maintain a U.At Raytheon, the foundation ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Systems Applications Engineer (GMSL)

    Senior Systems Applications Engineer (GMSL)

    1010 Analog Devices Inc.Wilmington, MA, United States
    Full-time +1
    NASDAQ : ADI ) is a global semiconductor leader that bridges the physical and digital worlds to enable breakthroughs at the Intelligent Edge. ADI combines analog, digital, and software technologie...Show moreLast updated: 19 days ago
    • Promoted
    Senior Software Engineer, ML Infrastructure

    Senior Software Engineer, ML Infrastructure

    MotionalBoston, MA, US
    Full-time
    Our team builds the foundational infrastructure that empowers Machine Learning Engineers to develop the next generation of self-driving technology. We design and operate the high-performance, large-...Show moreLast updated: 30+ days ago
    • Promoted
    Principal Software Engineer, ML Infrastructure

    Principal Software Engineer, ML Infrastructure

    MotionalBoston, MA, US
    Full-time
    Our team builds the foundational infrastructure that empowers Machine Learning Engineers to develop the next generation of self-driving technology. We design and operate the high-performance, large-...Show moreLast updated: 30+ days ago
    • Promoted
    Engineer, Senior

    Engineer, Senior

    Constellation EnergyCanton, MA, US
    Full-time
    As the nation's largest producer of clean, carbon-free energy, Constellation is focused on our purpose : accelerating the transition to a carbon-free future. We have been the leader in clean ener...Show moreLast updated: 3 days ago
    • Promoted
    Operations Associate

    Operations Associate

    Beacon HillNorwell, MA, US
    Full-time
    Our client, a leading provider of environmental and industrial services located in Norwell, MA, is seeking a.Fleet Management Administrator. Qualified and interested individuals are encouraged to ap...Show moreLast updated: 1 day ago
    • Promoted
    Manufacturing Systems Engineer Level 1 or 2

    Manufacturing Systems Engineer Level 1 or 2

    Northrop GrummanPlymouth, US
    Full-time
    RELOCATION ASSISTANCE : Relocation assistance may be available.At Northrop Grumman, our employees have incredible opportunities to work on revolutionary systems that impact people's lives around th...Show moreLast updated: 4 days ago
    • Promoted
    Machine Learning Operations Engineer

    Machine Learning Operations Engineer

    Cyvl, Inc.Boston, MA, United States
    Full-time
    Cyvl is a Boston-based tech startup revolutionizing the way civil engineering firms and governments map and manage transportation infrastructure. Our enterprise-grade hardware and software solutions...Show moreLast updated: 30+ days ago
    • Promoted
    Senior MLOps Engineer, vLLM Inference

    Senior MLOps Engineer, vLLM Inference

    Red HatBoston, MA, United States
    Full-time +1
    Senior MLOps Engineer, vLLM Inference page is loaded## Senior MLOps Engineer, vLLM Inferenceremote type : Hybridlocations : Boston : Dublin - MSO : Remote Ireland : Waterford Citytime type : Full timepos...Show moreLast updated: 4 days ago
    • Promoted
    MuleSoft QA Engineer

    MuleSoft QA Engineer

    UniFirstWilmington, MA, US
    Full-time
    This is a hybrid role with 50% on-site requirement in Wilmington, MA.The ideal candidate will validate.Functional Solution Documents (FSDs). The role requires proficiency in.API test automation tool...Show moreLast updated: 1 day ago
    • Promoted
    P4 Principal Mechanical Engineer Lead

    P4 Principal Mechanical Engineer Lead

    RaytheonTewksbury, MA, US
    Full-time
    MA131 : Tewksbury, MA Bldg 1 Assabet 50 Apple Hill Drive Assabet - Building 1, Tewksbury, MA, 01876 USA.Person, or Immigration Status Requirements : . The ability to obtain and maintain a U.At Raytheo...Show moreLast updated: 6 days ago