Talent.com
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

HedraSan Francisco, California, United States
30+ days ago
Job type
  • Full-time
Job description

About Hedra

Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures. We're building Hedra Studio, a multimodal creation platform capable of control, emotion, and creative intelligence.

At the core of Hedra Studio is our Character-3 foundation model, the first omnimodal model in production. Character-3 jointly reasons across image, text, and audio for more intelligent video generation — it’s the next evolution of AI-driven content creation.

At Hedra, we’re a team of hard-working, passionate individuals seeking to fundamentally change content creation and build a generational company together. We value startup energy, initiative, and the ability to turn bold ideas into real products. Our team is fully in-person in SF / NY with a shared love for whiteboard problem-solving.

Overview

We are looking for an ML Engineer with 3+ YOE in high-performance computing systems to manage and optimize our computational infrastructure for training and deploying our machine learning models. The ideal candidate has diverse experience managing ML workloads at scale, supporting our 3DVAE and video diffusion models. We encourage you to apply even if you don't meet every requirement — we value curiosity, creativity, and the drive to solve hard problems.

Responsibilities

Design, implement, and maintain scalable computing solutions for training and deploying ML models, ensuring infrastructure can handle large video datasets.

Manage and optimize the performance of our computing clusters or cloud instances, such as AWS or Google Cloud, to support distributed training.

Ensure that our infrastructure can handle the resource-intensive tasks associated with training large generative models.

Monitor system performance and implement improvements to maximize efficiency and utilization , using tools like Airflow for orchestration.

Collaborate across research teams to understand their computational needs and provide appropriate solutions, facilitating seamless model deployment.

Qualifications

Bachelor’s degree in Computer Science, Information Technology, or a related field, with a focus on system administration.

Experience with cloud computing platforms such as Amazon Web Services, Google Cloud, or Microsoft Azure, essential for managing large-scale ML workloads.

Values engineering processes and version control (CI / CD).

Knowledge of containerization technologies like Docker and Kubernetes required for deployments at scale.

Understanding of distributed training techniques and how to scale models across multi-node clusters aligning with video generation needs.

Strong problem-solving and communication skills, given the need to collaborate with diverse teams.

This role is vital for ensuring the computational backbone supports the company’s ML efforts, focusing on deployment and scalability.

Benefits

Competitive compensation + equity

401k (no match)

Healthcare (Silver PPO Medical, Vision, Dental)

Lunch and snacks at the office

Create a job alert for this search

Machine Learning Engineer • San Francisco, California, United States

Related jobs
  • Promoted
Machine Learning Engineer, Distributed Training, Optimus

Machine Learning Engineer, Distributed Training, Optimus

Tesla Motors, Inc.Palo Alto, CA, United States
Full-time
As a Software Engineer for the Optimus team, you will build the tools and infrastructure to make and measure improvements to neural network architecture, visualize data, assist with exporting and d...Show moreLast updated: 30+ days ago
  • Promoted
Machine Learning Infrastructure and Data Engineer

Machine Learning Infrastructure and Data Engineer

Apple Inc.Sunnyvale, CA, United States
Full-time
Machine Learning Infrastructure and Data Engineer.Sunnyvale, California, United States Machine Learning and AI.The Video Computer Vision organization is working on exciting technologies for future ...Show moreLast updated: 30+ days ago
  • Promoted
Machine Learning Engineer, News Team

Machine Learning Engineer, News Team

AppleCupertino, CA, United States
Full-time
At Apple, new ideas have a way of becoming phenomenal products, services, and customer experiences very quickly! Bring passion and dedication to your job and there's no telling what you could accom...Show moreLast updated: 30+ days ago
  • Promoted
Machine Learning Enablement Engineer, Siri Core Modeling

Machine Learning Enablement Engineer, Siri Core Modeling

Apple Inc.Sunnyvale, CA, United States
Full-time
Machine Learning Enablement Engineer, Siri Core Modeling.Sunnyvale, California, United States Machine Learning and AI.Imagine what you could do here. At Apple, revolutionary ideas have a way of beco...Show moreLast updated: 16 days ago
  • Promoted
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Greylock PartnersSan Francisco, CA, United States
Full-time
Machine Learning Infrastructure Engineer — join early B2C investment to help build large-scale ML infrastructure for a cutting-edge AI-first mobile product. Founders have experience building iconic ...Show moreLast updated: 30+ days ago
  • Promoted
  • New!
Machine Learning - Infrastructure

Machine Learning - Infrastructure

Causal LabsSan Francisco, CA, United States
Full-time
Our mission is to build causal intelligence, starting with physics models to predict and control the weather.We're building a small team driven by a deep passion and urgency to solve this civilizat...Show moreLast updated: 7 hours ago
  • Promoted
Machine Learning Infrastructure Simulation Engineer, Optimus

Machine Learning Infrastructure Simulation Engineer, Optimus

Tesla Motors, Inc.Palo Alto, CA, United States
Full-time
The Optimus Simulation team is at the forefront of advancing humanoid robotics by building a high-fidelity virtual world where Optimus can safely learn, adapt, and improve.Our mission is to recreat...Show moreLast updated: 30+ days ago
  • Promoted
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

AbridgeSan Francisco, CA, United States
Full-time
Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer.Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare.Our AI‑powered platform...Show moreLast updated: 30+ days ago
  • Promoted
Machine Learning Engineer - Training & Infrastructure

Machine Learning Engineer - Training & Infrastructure

P-1 AISan Francisco, CA, United States
Full-time
We are building an engineering AGI.We founded P-1 AI with the conviction that the greatest impact of artificial intelligence will be on the built world—helping mankind conquer nature and bend it to...Show moreLast updated: 30+ days ago
  • Promoted
Machine Learning Systems Engineer, Tooling & Infrastructure, Optimus

Machine Learning Systems Engineer, Tooling & Infrastructure, Optimus

TeslaPalo Alto, CA, United States
Full-time
As a Software Engineer for the Optimus team, you will build the tools and infrastructure to make and measure improvements to neural network architecture by building and automating scalable data and...Show moreLast updated: 30+ days ago
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

IntelliPro Group Inc.San Francisco, CA, US
Full-time
Quick Apply
Machine Learning Engineer, Training Infrastructure Position Type : Full time Location : San Francisco, CA, USA Salary Range : $150,000 - $250, 000 (USD) Job ID# : 158135 Job Description : We are l...Show moreLast updated: 30+ days ago
  • Promoted
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

Hedra, IncSan Francisco, CA, United States
Full-time
Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show moreLast updated: 30+ days ago
  • Promoted
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

Ipro Networks Pte. Ltd.San Francisco, CA, United States
Full-time
Job Title : Machine Learning Engineer, Training Infrastructure | Position Type : Full time | Location : San Francisco, CA, USA | Salary Range : $150,000 - $250,000 (USD) | Job ID# : 158135.Design, imple...Show moreLast updated: 30+ days ago
  • Promoted
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Character.AISan Francisco, CA, United States
Full-time
Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer. Machine Learning Infrastructure Engineer.Get AI-powered advice on this job...Show moreLast updated: 30+ days ago
  • Promoted
Machine Learning Engineer - Post Training

Machine Learning Engineer - Post Training

EPM ScientificSan Francisco County, CA, United States
Full-time
Machine Learning Engineer - Post Training.A stealth-stage venture backed by Lux Capital (investors in DeepMind and OpenAI) is developing frontier-scale AI systems for high-impact applications in hu...Show moreLast updated: 16 days ago
  • Promoted
Machine Learning Infrastructure Engineers (Multiple Opportunities)

Machine Learning Infrastructure Engineers (Multiple Opportunities)

Greylock PartnersSan Francisco, CA, US
Full-time
Job Overview To help support the growth of several investments in the SF Bay Area, we're looking to connect with talented engineers who have strong infrastructure and distributed systems backgroun...Show moreLast updated: 30+ days ago
  • Promoted
Training : ML Framework Engineer

Training : ML Framework Engineer

OpenAISan Francisco, CA, United States
Full-time
Training Runtime designs the core distributed machine-learning training runtime that powers everything from early research experiments to frontier‑scale model runs. With a dual mandate to accelerate...Show moreLast updated: 22 days ago
  • Promoted
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

HedraSan Francisco, CA, United States
Full-time
Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show moreLast updated: 30+ days ago