Talent.com
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

Intellipro GroupSan Francisco, CA, United States
16 hours ago
Job type
  • Full-time
Job description

Machine Learning Engineer, Training Infrastructure

We are looking for an ML Engineer with 3+ YOE in high-performance computing systems to manage and optimize our computational infrastructure for training and deploying our machine learning models. The ideal candidate has diverse experience managing ML workloads at scale, supporting our 3DVAE and video diffusion models. We encourage you to apply even if you don't meet every requirement we value curiosity, creativity, and the drive to solve hard problems.

Responsibilities

  • Design, implement, and maintain scalable computing solutions for training and deploying ML models, ensuring infrastructure can handle large video datasets.
  • Manage and optimize the performance of our computing clusters or cloud instances, such as AWS or Google Cloud, to support distributed training.
  • Ensure that our infrastructure can handle the resource-intensive tasks associated with training large generative models.
  • Monitor system performance and implement improvements to maximize efficiency and utilization, using tools like Airflow for orchestration.
  • Collaborate across research teams to understand their computational needs and provide appropriate solutions, facilitating seamless model deployment.

Requirements

  • Bachelor's degree in Computer Science, Information Technology, or a related field, with a focus on system administration.
  • Experience with cloud computing platforms such as Amazon Web Services, Google Cloud, or Microsoft Azure, essential for managing large-scale ML workloads.
  • This role is vital for ensuring the computational backbone supports the company's ML efforts, focusing on deployment and scalability.
  • Values engineering processes and version control (CI / CD).
  • Knowledge of containerization technologies like Docker and Kubernetes required for deployments at scale.
  • Understanding of distributed training techniques and how to scale models across multi-node clusters aligning with video generation needs.
  • Strong problem-solving and communication skills, given the need to collaborate with diverse teams.
  • Founded in 2009, IntelliPro is a global leader in talent acquisition and HR solutions. Our commitment to delivering unparalleled service to clients, fostering employee growth, and building enduring partnerships sets us apart. We continue leading global talent solutions with a dynamic presence in over 160 countries, including the USA, China, Canada, Singapore, Japan, Philippines, UK, India, Netherlands, and the EU. IntelliPro, a global leader connecting individuals with rewarding employment opportunities, is dedicated to understanding your career aspirations. As an Equal Opportunity Employer, IntelliPro values diversity and does not discriminate based on race, color, religion, sex, sexual orientation, gender identity, national origin, age, genetic information, disability, or any other legally protected group status. Moreover, our Inclusivity Commitment emphasizes embracing candidates of all abilities and ensures that our hiring and interview processes accommodate the needs of all applicants.

    The pay offered to a successful candidate will be determined by various factors, including education, work experience, location, job responsibilities, certifications, and more. Additionally, IntelliPro provides a comprehensive benefits package, all subject to eligibility.

    Create a job alert for this search

    Machine Learning Engineer • San Francisco, CA, United States

    Related jobs
    • Promoted
    • New!
    Machine Learning Engineer - Training & Infrastructure

    Machine Learning Engineer - Training & Infrastructure

    P-1 AISan Francisco, CA, United States
    Full-time
    We are building an engineering AGI.We founded P-1 AI with the conviction that the greatest impact of artificial intelligence will be on the built worldhelping mankind conquer nature and bend it to ...Show moreLast updated: 16 hours ago
    • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Greylock PartnersSan Francisco, CA, United States
    Full-time
    Machine Learning Infrastructure Engineer — join early B2C investment to help build large-scale ML infrastructure for a cutting-edge AI-first mobile product. Founders have experience building iconic ...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Staff Machine Learning Infrastructure Engineer

    Staff Machine Learning Infrastructure Engineer

    DYNA Robotics IncRedwood City, CA, United States
    Full-time
    Dyna Robotics makes general-purpose robots powered by a proprietary embodied AI foundation model that generalizes and self-improves across varied environments with commercial-grade performance.Dyna...Show moreLast updated: 16 hours ago
    • Promoted
    • New!
    Staff Machine Learning Engineer, Infrastructure

    Staff Machine Learning Engineer, Infrastructure

    WaymoSan Francisco, CA, United States
    Full-time
    Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver.Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on buildin...Show moreLast updated: 16 hours ago
    • Promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    HEDRA INCSan Francisco, CA, United States
    Full-time
    Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show moreLast updated: 30+ days ago
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    IntelliPro Group Inc.San Francisco, CA, US
    Full-time
    Quick Apply
    Machine Learning Engineer, Training Infrastructure Position Type : Full time Location : San Francisco, CA, USA Salary Range : $150,000 - $250, 000 (USD) Job ID# : 158135 Job Description : We are l...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Staff Machine Learning Engineer, ML Infrastructure (Predictive Planner)

    Staff Machine Learning Engineer, ML Infrastructure (Predictive Planner)

    WaymoSan Francisco, CA, United States
    Full-time
    Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver.Since its start as the Google Self?Driving Car Project in 2009, Waymo has focused on buildin...Show moreLast updated: 16 hours ago
    • Promoted
    • New!
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Abridge Al, IncSan Francisco, CA, United States
    Full-time
    Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare.Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation eff...Show moreLast updated: 16 hours ago
    • Promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    Hedra, IncSan Francisco, CA, United States
    Full-time
    Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    Ipro Networks Pte. Ltd.San Francisco, CA, United States
    Full-time
    Job Title : Machine Learning Engineer, Training Infrastructure | Position Type : Full time | Location : San Francisco, CA, USA | Salary Range : $150,000 - $250,000 (USD) | Job ID# : 158135.Design, imple...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Character.AISan Francisco, CA, United States
    Full-time
    Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer. Machine Learning Infrastructure Engineer.Get AI-powered advice on this job...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    CharacterRedwood City, CA, United States
    Full-time
    We're looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research. Provide infrastructure support to our ...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Machine Learning Engineer - Post Training

    Machine Learning Engineer - Post Training

    EPM ScientificSan Francisco, CA, United States
    Full-time
    Machine Learning Engineer - Post Training.A stealth-stage venture backed by Lux Capital (investors in DeepMind and OpenAI) is developing frontier-scale AI systems for high-impact applications in hu...Show moreLast updated: 16 hours ago
    • Promoted
    • New!
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Character.aiRedwood City, California, United States
    Full-time
    About the role We’re looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research.Provide infrastructure ...Show moreLast updated: 6 hours ago
    • Promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    HedraSan Francisco, CA, United States
    Full-time
    Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Senior / Staff Machine Learning Infrastructure Engineer

    Senior / Staff Machine Learning Infrastructure Engineer

    Calico LLCSouth San Francisco, CA, United States
    Full-time
    Senior / Staff Machine Learning Infrastructure Engineer.Senior / Staff Machine Learning Infrastructure Engineer.Senior / Staff Machine Learning Infrastructure Engineer. Senior / Staff Machine Learni...Show moreLast updated: 16 hours ago
    • Promoted
    • New!
    Machine Learning Infrastructure Engineers (Multiple Opportunities)

    Machine Learning Infrastructure Engineers (Multiple Opportunities)

    Greylock PartnersSan Francisco, CA, United States
    Full-time
    Get AI-powered advice on this job and more exclusive features.To help support the growth of several investments of ours in SF Bay Area, we're looking to network with talented engineers with strong ...Show moreLast updated: 16 hours ago
    • Promoted
    • New!
    Machine Learning Engineer, Distributed Training, Optimus

    Machine Learning Engineer, Distributed Training, Optimus

    Tesla Motors, Inc.Palo Alto, California, United States
    Full-time
    What to Expect As a Software Engineer for the Optimus team, you will build the tools and infrastructure to make and measure improvements to neural network architecture, visualize data, assist with ...Show moreLast updated: 6 hours ago