Talent.com
Machine Learning - Infrastructure

Machine Learning - Infrastructure

Causal LabsSan Francisco, CA, United States
15 hours ago
Job type
  • Full-time
Job description

About us

Our mission is to build causal intelligence, starting with physics models to predict and control the weather.

We're building a small team driven by a deep passion and urgency to solve this civilizationally important problem.

Our founding team has led & shipped models across self‑driving cars, humanoid robotics, protein folding, and video generation at world‑class institutions including Google DeepMind, Cruise, Waymo, Meta, Nabla Bio, and Apple.

Responsibilities

  • Design, deploy, and maintain large distributed ML training and inference clusters
  • Develop efficient, scalable end‑to‑end pipelines to manage petabyte‑scale datasets and model training throughout the entire ML lifecycle
  • Research and test various training approaches including parallelization techniques and numerical precision trade‑offs across different model scales
  • Analyze, profile and debug low‑level GPU operations to optimize performance
  • Stay up‑to‑date on research to bring new ideas to work

What we’re looking for

We value a relentless approach to problem‑solving, rapid execution, and the ability to quickly learn in unfamiliar domains.

  • Strong grasp of state‑of‑the‑art techniques for optimizing training and inference workloads
  • Demonstrated proficiency with distributed training frameworks (e.g. FSDP, DeepSpeed) to train large foundation models
  • Knowledge of cloud platforms (GCP, AWS, or Azure) and their ML / AI service offerings
  • Familiarity with containerization and orchestration frameworks (e.g., Kubernetes, Docker)
  • Background working on distributed task management systems and scalable model serving & deployment architectures
  • Understanding of monitoring, logging, observability, and version control best practices for ML systems
  • You don’t have to meet every single requirement above.

    Benefits

  • Work on deeply challenging, unsolved problems
  • Competitive cash and equity compensation
  • Medical, dental, and vision insurance
  • Catered lunch & dinner
  • Unlimited paid time off
  • Visa sponsorship & relocation support
  • #J-18808-Ljbffr

    Create a job alert for this search

    Machine Learning • San Francisco, CA, United States

    Related jobs
    • Promoted
    Machine Learning Infrastructure and Data Engineer

    Machine Learning Infrastructure and Data Engineer

    Apple Inc.Sunnyvale, CA, United States
    Full-time
    Machine Learning Infrastructure and Data Engineer.Sunnyvale, California, United States Machine Learning and AI.The Video Computer Vision organization is working on exciting technologies for future ...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Greylock PartnersSan Francisco, CA, United States
    Full-time
    Machine Learning Infrastructure Engineer — join early B2C investment to help build large-scale ML infrastructure for a cutting-edge AI-first mobile product. Founders have experience building iconic ...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer, Machine Learning Infrastructure

    Software Engineer, Machine Learning Infrastructure

    David AISan Francisco, California, United States
    Full-time
    David AI is the first audio data research company.We bring an R&D approach to data–developing datasets with the same rigor AI labs bring to models. Our mission is to bring AI into the real world, an...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Infrastructure Simulation Engineer, Optimus

    Machine Learning Infrastructure Simulation Engineer, Optimus

    Tesla Motors, Inc.Palo Alto, CA, United States
    Full-time
    The Optimus Simulation team is at the forefront of advancing humanoid robotics by building a high-fidelity virtual world where Optimus can safely learn, adapt, and improve.Our mission is to recreat...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    AbridgeSan Francisco, CA, United States
    Full-time
    Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer.Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare.Our AI‑powered platform...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Character.AIMenlo Park, CA, United States
    Full-time
    We’re looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research. Provide infrastructure support to our ...Show moreLast updated: 24 days ago
    • Promoted
    Machine Learning Engineer - Training & Infrastructure

    Machine Learning Engineer - Training & Infrastructure

    P-1 AISan Francisco, CA, United States
    Full-time
    We are building an engineering AGI.We founded P-1 AI with the conviction that the greatest impact of artificial intelligence will be on the built world—helping mankind conquer nature and bend it to...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Ambience HealthcareSan Francisco, California, United States
    Full-time
    Ambience is developing the most capable AI systems for healthcare and medicine.As healthcare costs soar to 17.US GDP and a projected shortage of 100,000 physicians within the next decade, the need ...Show moreLast updated: 30+ days ago
    • Promoted
    ML Infrastructure Engineer

    ML Infrastructure Engineer

    OpenaiSan Francisco, California, United States
    Full-time
    The Runtime team builds the low level framework components to power our ML training systems.We work on building robust, scalable, high performance components to support our distributed training wor...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    Hedra, IncSan Francisco, CA, United States
    Full-time
    Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer, Machine Learning Infrastructure

    Software Engineer, Machine Learning Infrastructure

    DatologyaiRedwood City, California, United States
    Full-time
    Companies want to train their own large models on their own data.The current industry standard is to train on a random sample of your data, which is inefficient at best and actively harmful to mode...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Infrastructure Engineers (Multiple Opportunities)

    Machine Learning Infrastructure Engineers (Multiple Opportunities)

    Greylock PartnersSan Francisco, CA, United States
    Full-time
    To help support the growth of several investments in the SF Bay Area, we’re looking to connect with talented engineers who have strong infrastructure and distributed systems backgrounds and who are...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    Ipro Networks Pte. Ltd.San Francisco, CA, United States
    Full-time
    Job Title : Machine Learning Engineer, Training Infrastructure | Position Type : Full time | Location : San Francisco, CA, USA | Salary Range : $150,000 - $250,000 (USD) | Job ID# : 158135.Design, imple...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    HedraSan Francisco, California, United States
    Full-time
    Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show moreLast updated: 30+ days ago
    • Promoted
    AIML - Sr. Machine Learning Infrastructure Engineer, Evaluation

    AIML - Sr. Machine Learning Infrastructure Engineer, Evaluation

    Apple Inc.San Francisco, CA, United States
    Full-time
    Machine Learning Infrastructure Engineer, Evaluation.San Francisco, California, United States Software and Services.How do we ensure that Apple's most advanced AI features perform flawlessly for ev...Show moreLast updated: 7 days ago
    • Promoted
    Software Engineer, Machine Learning Infrastructure

    Software Engineer, Machine Learning Infrastructure

    NuroMountain View, California, United States
    Full-time
    Nuro is a self-driving technology company on a mission to make autonomy accessible to all.Founded in 2016, Nuro is building the world’s most scalable driver, combining cutting-edge AI with automoti...Show moreLast updated: 2 days ago
    • Promoted
    ML Infrastructure Engineer, Safeguards

    ML Infrastructure Engineer, Safeguards

    AnthropicSan Francisco, California, United States
    Full-time
    Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    Intellipro GroupSan Francisco, California, United States
    Full-time
    Machine Learning Engineer, Training Infrastructure.We are looking for an ML Engineer with .ML workloads at scale, supporting our 3DVAE and video diffusion models. We encourage you to apply even if y...Show moreLast updated: 30+ days ago