Talent.com
Machine Learning Engineer, Training Infrastructure
Machine Learning Engineer, Training InfrastructureHedra • San Francisco, CA, US
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

Hedra • San Francisco, CA, US
30+ days ago
Job type
  • Full-time
Job description

Machine Learning Engineer, Training Infrastructure Join to apply for the Machine Learning Engineer, Training Infrastructure role at Hedra

Overview We are looking for an ML Engineer with 3+ years of experience in high-performance computing systems to manage and optimize our computational infrastructure for training and deploying our machine learning models. The ideal candidate has diverse experience managing ML workloads at scale, supporting our 3DVAE and video diffusion models. We encourage you to apply even if you don't meet every requirement — curiosity, creativity, and the drive to solve hard problems are valued.

Responsibilities Design, implement, and maintain scalable computing solutions for training and deploying ML models, ensuring infrastructure can handle large video datasets.

Manage and optimize the performance of computing clusters or cloud instances (e.g., AWS, Google Cloud) to support distributed training.

Ensure infrastructure can handle resource-intensive tasks associated with training large generative models.

Monitor system performance and implement improvements to maximize efficiency and utilization, using tools like Airflow for orchestration.

Collaborate with research teams to understand computational needs and provide appropriate solutions, facilitating seamless model deployment.

Qualifications Bachelor's degree in Computer Science, Information Technology, or a related field, with a focus on system administration.

Experience with cloud platforms such as Amazon Web Services, Google Cloud, or Microsoft Azure.

Experience with version control and CI / CD processes.

Knowledge of containerization technologies like Docker and Kubernetes for deployments at scale.

Understanding of distributed training techniques and scaling models across multi-node clusters, aligned with video generation needs.

Strong problem-solving and communication skills for collaboration with diverse teams.

Benefits Competitive compensation + equity

401k (no match)

Healthcare (Silver PPO Medical, Vision, Dental)

Lunch and snacks at the office

Additional Seniority level : Mid-Senior level

Employment type : Full-time

Job function : Engineering and Information Technology

Industries : Technology, Information and Internet

J-18808-Ljbffr

Create a job alert for this search

Machine Learning Engineer • San Francisco, CA, US

Related jobs
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

Intellipro Group • San Francisco, CA, United States
Full-time
Machine Learning Engineer, Training Infrastructure.We are looking for an ML Engineer with 3+ YOE in high-performance computing systems to manage and optimize our computational infrastructure for tr...Show more
Last updated: 17 days ago • Promoted
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

Hedra, Inc • San Francisco, CA, United States
Full-time
Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show more
Last updated: 30+ days ago • Promoted
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Greylock Partners • San Francisco, CA, United States
Full-time
Machine Learning Infrastructure Engineer — join early B2C investment to help build large-scale ML infrastructure for a cutting-edge AI-first mobile product. Founders have experience building iconic ...Show more
Last updated: 30+ days ago • Promoted
Machine Learning - Infrastructure

Machine Learning - Infrastructure

Causal Labs • San Francisco, CA, United States
Full-time
Our mission is to build causal intelligence, starting with physics models to predict and control the weather.We're building a small team driven by a deep passion and urgency to solve this civilizat...Show more
Last updated: 6 days ago • Promoted
Staff Machine Learning Infrastructure Engineer

Staff Machine Learning Infrastructure Engineer

DYNA Robotics Inc • Redwood City, CA, United States
Full-time
Dyna Robotics makes general-purpose robots powered by a proprietary embodied AI foundation model that generalizes and self-improves across varied environments with commercial-grade performance.Dyna...Show more
Last updated: 17 days ago • Promoted
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Abridge • San Francisco, CA, United States
Full-time
Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer.Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare.Our AI‑powered platform...Show more
Last updated: 30+ days ago • Promoted
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Saxon Global • Menlo Park, CA, United States
Full-time
Strong foundation in machine learning, deep learning, and computer vision.Experience with distributed systems and scalable ML infrastructure. Proficient in Python and software development best pract...Show more
Last updated: 14 days ago • Promoted
Machine Learning Engineer - Training & Infrastructure

Machine Learning Engineer - Training & Infrastructure

P-1 AI • San Francisco, CA, United States
Full-time
We are building an engineering AGI.We founded P-1 AI with the conviction that the greatest impact of artificial intelligence will be on the built world—helping mankind conquer nature and bend it to...Show more
Last updated: 30+ days ago • Promoted
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

HEDRA INC • San Francisco, CA, United States
Full-time
Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show more
Last updated: 30+ days ago • Promoted
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

IntelliPro Group Inc. • San Francisco, CA, US
Full-time
Quick Apply
Machine Learning Engineer, Training Infrastructure Position Type : Full time Location : San Francisco, CA, USA Salary Range : $150,000 - $250, 000 (USD) Job ID# : 158135 Job Description : We are l...Show more
Last updated: 30+ days ago
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Workshop Labs • San Francisco, CA, United States
Full-time
Build the infrastructure to serve personal AI models privately and at scale.We're building the first truly private, personal AI - one that learns your skills, judgment, and preferences without big ...Show more
Last updated: 16 days ago • Promoted
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

Ipro Networks Pte. Ltd. • San Francisco, CA, United States
Full-time
Job Title : Machine Learning Engineer, Training Infrastructure | Position Type : Full time | Location : San Francisco, CA, USA | Salary Range : $150,000 - $250,000 (USD) | Job ID# : 158135.Design, imple...Show more
Last updated: 30+ days ago • Promoted
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Character.AI • San Francisco, CA, United States
Full-time
Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer. Machine Learning Infrastructure Engineer.Get AI-powered advice on this job...Show more
Last updated: 30+ days ago • Promoted
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Character • Redwood City, CA, United States
Full-time
We're looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research. Provide infrastructure support to our ...Show more
Last updated: 30+ days ago • Promoted
Machine Learning Engineer - Post Training

Machine Learning Engineer - Post Training

EPM Scientific • San Francisco, CA, United States
Full-time
Machine Learning Engineer - Post Training.A stealth-stage venture backed by Lux Capital (investors in DeepMind and OpenAI) is developing frontier-scale AI systems for high-impact applications in hu...Show more
Last updated: 17 days ago • Promoted
Machine Learning Engineer - Model Evaluations, Public Sector

Machine Learning Engineer - Model Evaluations, Public Sector

Scale AI, Inc. • San Francisco, CA, United States
Full-time
Machine Learning Engineer - Model Evaluations, Public Sector.The Public Sector ML team at Scale deploys advanced AI systems-including LLMs, agentic models, and multimodal pipelines-into mission-cri...Show more
Last updated: 9 days ago • Promoted
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

Hedra • San Francisco, CA, United States
Full-time
Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show more
Last updated: 30+ days ago • Promoted
Senior / Staff Machine Learning Infrastructure Engineer

Senior / Staff Machine Learning Infrastructure Engineer

Calico LLC • South San Francisco, CA, United States
Full-time
Senior / Staff Machine Learning Infrastructure Engineer.Senior / Staff Machine Learning Infrastructure Engineer.Senior / Staff Machine Learning Infrastructure Engineer. Senior / Staff Machine Learni...Show more
Last updated: 17 days ago • Promoted