Talent.com
Research Engineer, Training Infrastructure Lead

Research Engineer, Training Infrastructure Lead

Menlo VenturesSan Francisco, CA, US
1 day ago
Job type
  • Full-time
Job description

About Goodfire

Behind our name : Like fire, AI holds the potential for both immense benefit and significant risk. Just as mastering fire transformed human history, we believe the safe and intentional development of AI will shape the future of our species. Our goal is to tame this new fire.

Goodfire is an AI interpretability research company focused on understanding and designing AI systems that people can trust. Our mission is to advance humanity's understanding of AI to build safe and powerful AI systems. We believe that deep research breakthroughs are necessary to make this possible.

Goodfire is a public benefit corporation headquartered in San Francisco with a team of the world's top interpretability researchers and engineers from organizations like OpenAI and DeepMind. We've raised $59M from investors like Menlo, Lightspeed and Anthropic and work with customers including Arc Institute, Mayo Clinic, and Rakuten.

About the role

We're seeking a senior engineering leader to own and evolve research platform and training infrastructure. You'll define both the technical vision and the implementation strategy for the systems that power our research breakthroughs.

Key Responsibilities

Design and build customizable training pipelines that scale from experimentation to production

Architect and implement large-scale model serving infrastructure for interpretability (reference : NDIF, Garcon)

Identify and execute on opportunities to dramatically accelerate research velocity

Lead technical decision-making for infrastructure that supports cutting-edge AI research

What you'll bring

Required experience

5+ years of experience in ML infrastructure, research engineering, and / or systems programming

Leadership experience as senior architect, tech lead, and / or engineering manager

Cross-functional expertise bridging research and engineering domains

Technical proficiency in Python, PyTorch / JAX, and distributed systems

Production experience deploying and maintaining ML systems at scale

Mission alignment with advancing AI safety and interpretability

Core competencies

High-ownership leadership

Owns broad areas with autonomy, driving architectural and strategic decisions even amid uncertainty

Balances technical depth with speed, adapting as priorities evolve

Research-to-production mindset

Bridges fast research iteration with reliable, scalable production systems

Designs abstractions that preserve flexibility while ensuring robustness

Modern ML & infrastructure expertise

Deep experience in Python, PyTorch, and large-scale training strategies

Hands-on with end-to-end ML infrastructure : from experiments to serving

Strong track record of scaling systems and debugging complex runs

Preferred qualifications

Contributions to open-source ML infrastructure projects

Experience in fast-paced startup or research lab environments

Our values

Goodfire is looking for individuals who embody our values and share our deep commitment to making interpretability accessible. We are building a team first and foremost.

Put mission and team first

All we do is in service of our mission. We trust each other, deeply care about the success of the organization, and choose to put our team above ourselves.

Improve constantly

We are constantly looking to improve every piece of the business. We proactively critique ourselves and others in a kind and thoughtful way that translates to practical improvements in the organization. We are pragmatic and consistently implement the obvious fixes that work.

Take ownership and initiative

There are no bystanders here. We proactively identify problems and take full responsibility over getting a strong result. We are self-driven, own our mistakes, and feel deep responsibility over what we're building.

Action today

We have a small amount of time to do something incredibly hard and meaningful. The pace and intensity of the organization is high. If we can take action today or tomorrow, we will choose to do it today.

What we offer

This role offers market competitive salary, equity, and competitive benefits.

The expected salary range for this position is $200,000 - $400,000 USD

Most importantly, you'll have the opportunity to join a vital mission at an important point in its trajectory — we are developing groundbreaking technology with a world-class team on the critical path to ensuring a safe and beneficial future for humanity. If you want to do your life's work with us, even if you believe you do not meet every single requirement, apply now.

J-18808-Ljbffr

Create a job alert for this search

Lead Infrastructure Engineer • San Francisco, CA, US

Related jobs
  • Promoted
Research Engineer, Pre-training

Research Engineer, Pre-training

AnthropicSan Francisco, CA, United States
Full-time
Research Engineer, Pre-training.Get AI-powered advice on this job and more exclusive features.Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be saf...Show moreLast updated: 4 days ago
  • Promoted
ML Research Engineer - Training

ML Research Engineer - Training

AchiraSan Francisco, CA, United States
Full-time
Join a world‑class team of scientists, ML researchers, and engineers working together to make the physical microcosm predictable and reshape the future of drug discovery. Move beyond the beaten path...Show moreLast updated: 30+ days ago
  • Promoted
LLM Training Resilience Engineer

LLM Training Resilience Engineer

Together AISan Francisco, CA, United States
Full-time
AI infrastructure development, creating robust platforms and frameworks to support state-of-the-art large-scale machine learning training. We specialize in delivering resilient, high-performance sys...Show moreLast updated: 30+ days ago
  • Promoted
Software Engineer, Research Infrastructure

Software Engineer, Research Infrastructure

OpenAISan Francisco, CA, United States
Full-time
Software Engineer, Research Infrastructure.This role will support the fleet infrastructure team at OpenAI.The fleet team focuses on running the world’s largest, most reliable, and frictionless GPU ...Show moreLast updated: 30+ days ago
  • Promoted
Research Engineer, Training Infrastructure

Research Engineer, Training Infrastructure

GoodfireSan Francisco, CA, United States
Full-time
Behind our name : Like fire, AI holds the potential for both immense benefit and significant risk.Just as mastering fire transformed human history, we believe the safe and intentional development of...Show moreLast updated: 30+ days ago
  • Promoted
Research Engineer, ML Systems (All Industry Levels)

Research Engineer, ML Systems (All Industry Levels)

Character.AISan Francisco, CA, United States
Full-time
Research Engineer, ML Systems (All Industry Levels).Research Engineer, ML Systems (All Industry Levels).Research Engineer, ML Systems (All Industry Levels). Research Engineer, ML Systems (All Indust...Show moreLast updated: 30+ days ago
  • Promoted
Machine Learning Engineer - Training & Infrastructure

Machine Learning Engineer - Training & Infrastructure

P-1 AISan Francisco, CA, United States
Full-time
We are building an engineering AGI.We founded P-1 AI with the conviction that the greatest impact of artificial intelligence will be on the built world—helping mankind conquer nature and bend it to...Show moreLast updated: 30+ days ago
Research Scientist / Engineer – Training Infrastructure

Research Scientist / Engineer – Training Infrastructure

IntelliPro Group Inc.Palo Alto, CA, US
Full-time
Quick Apply
Research Scientist / Engineer – Training Infrastructure Position Type : Full time Location : Palo Alto, CA • Remote - US • Remote - International Salary Range : $220,000 - $300...Show moreLast updated: 30+ days ago
  • Promoted
Software Engineer, AI Training and Infrastructure

Software Engineer, AI Training and Infrastructure

Skild.aiSan Francisco, CA, United States
Full-time
Software Engineer, AI Training and Infrastructure.At Skild AI, we are building the world's first general purpose robotic intelligence that is robust and adapts to unseen scenarios without failing.W...Show moreLast updated: 30+ days ago
  • Promoted
Research Engineer - ML Sys & Infra

Research Engineer - ML Sys & Infra

Storm3San Francisco, CA, United States
Full-time
This range is provided by Storm3.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Direct message the job poster from Storm3.Research Engineer - M...Show moreLast updated: 4 days ago
  • Promoted
ML Research Engineer, ML Systems

ML Research Engineer, ML Systems

Scale AI, Inc.San Francisco, CA, United States
Full-time
Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and opera...Show moreLast updated: 30+ days ago
  • Promoted
  • New!
Reinforcement Learning Research Engineer

Reinforcement Learning Research Engineer

Strativ GroupAlameda, CA, US
Full-time
Reinforcement Learning Research Engineer.A scaling, SOTA Generative AI Startup operating with a world class team (Founders have multiple prior exits) with talent from Open AI, IBM, MIT and several ...Show moreLast updated: 20 hours ago
  • Promoted
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

Hedra, IncSan Francisco, CA, United States
Full-time
Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show moreLast updated: 30+ days ago
  • Promoted
Research Engineer, Training Infrastructure Lead

Research Engineer, Training Infrastructure Lead

Menlo VenturesSan Francisco, CA, United States
Full-time
Behind our name : Like fire, AI holds the potential for both immense benefit and significant risk.Just as mastering fire transformed human history, we believe the safe and intentional development of...Show moreLast updated: 30+ days ago
  • Promoted
Research Engineer - Distributed Training

Research Engineer - Distributed Training

Prime IntellectSan Francisco, CA, United States
Full-time
Building Open Superintelligence Infrastructure.Prime Intellect is building the open superintelligence stack - from frontier agentic models to the infra that enables anyone to create, train, and dep...Show moreLast updated: 30+ days ago
  • Promoted
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

Ipro Networks Pte. Ltd.San Francisco, CA, United States
Full-time
Job Title : Machine Learning Engineer, Training Infrastructure | Position Type : Full time | Location : San Francisco, CA, USA | Salary Range : $150,000 - $250,000 (USD) | Job ID# : 158135.Design, imple...Show moreLast updated: 30+ days ago
  • Promoted
Research Engineer, Training Infrastructure Lead

Research Engineer, Training Infrastructure Lead

GoodfireSan Francisco, CA, United States
Full-time
Research Engineer, Training Infrastructure Lead.Behind our name : Like fire, AI holds the potential for both immense benefit and significant risk. Just as mastering fire transformed human history, we...Show moreLast updated: 30+ days ago
  • Promoted
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

HedraSan Francisco, CA, United States
Full-time
Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show moreLast updated: 30+ days ago