Talent.com
Machine Learning - Infrastructure
Machine Learning - InfrastructureCausal Labs, Inc. • San Francisco, CA, United States
Machine Learning - Infrastructure

Machine Learning - Infrastructure

Causal Labs, Inc. • San Francisco, CA, United States
30+ days ago
Job type
  • Full-time
Job description

About us

Our mission is to build causal intelligence, starting with physics models to predict and control the weather.

We're building a small team driven by a deep passion and urgency to solve this civilizationally important problem.

Our founding team has led & shipped models across self‑driving cars, humanoid robotics, protein folding, and video generation at world‑class institutions including Google DeepMind, Cruise, Waymo, Meta, Nabla Bio, and Apple.

Responsibilities

  • Design, deploy, and maintain large distributed ML training and inference clusters
  • Develop efficient, scalable end‑to‑end pipelines to manage petabyte‑scale datasets and model training throughout the entire ML lifecycle
  • Research and test various training approaches including parallelization techniques and numerical precision trade‑offs across different model scales
  • Analyze, profile and debug low‑level GPU operations to optimize performance
  • Stay up‑to‑date on research to bring new ideas to work

What we’re looking for

We value a relentless approach to problem‑solving, rapid execution, and the ability to quickly learn in unfamiliar domains.

  • Strong grasp of state‑of‑the‑art techniques for optimizing training and inference workloads
  • Demonstrated proficiency with distributed training frameworks (e.g. FSDP, DeepSpeed) to train large foundation models
  • Knowledge of cloud platforms (GCP, AWS, or Azure) and their ML / AI service offerings
  • Familiarity with containerization and orchestration frameworks (e.g., Kubernetes, Docker)
  • Background working on distributed task management systems and scalable model serving & deployment architectures
  • Understanding of monitoring, logging, observability, and version control best practices for ML systems
  • You don’t have to meet every single requirement above.

    Benefits

  • Work on deeply challenging, unsolved problems
  • Competitive cash and equity compensation
  • Medical, dental, and vision insurance
  • Catered lunch & dinner
  • Unlimited paid time off
  • Visa sponsorship & relocation support
  • #J-18808-Ljbffr

    Create a job alert for this search

    Machine Learning Infrastructure • San Francisco, CA, United States

    Similar jobs
    Machine Learning Engineer, Infrastructure

    Machine Learning Engineer, Infrastructure

    Glean • San Francisco, CA, United States
    Full-time
    Software Engineer, Machine Learning (Infrastructure) at Glean — a company building an AI-powered knowledge management platform to help teams find, organize, and share information efficiently.Our pr...Show more
    Last updated: 23 hours ago • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Ambience Healthcare, Inc. • San Francisco, CA, United States
    Full-time
    Ambience Healthcare is the leading AI platform for documentation, coding, and clinical workflow, built to reduce administrative burden and protect revenue integrity at the point of care.Trusted by ...Show more
    Last updated: 30+ days ago • Promoted
    ML Infrastructure Engineer, Safeguards

    ML Infrastructure Engineer, Safeguards

    The Rundown AI, Inc. • San Francisco, CA, United States
    Full-time
    We are seeking a Machine Learning Infrastructure Engineer to join our Safeguards organization, where you'll build and scale the critical infrastructure that powers our AI safety systems.You'll work...Show more
    Last updated: 15 days ago • Promoted
    ML Model Serving Infrastructure Engineer

    ML Model Serving Infrastructure Engineer

    Anyscale • San Francisco, CA, United States
    Full-time
    A technology company in San Francisco is seeking an experienced engineer to develop highly available ML model serving systems. The role requires proficiency in algorithms, system design, and modern ...Show more
    Last updated: 30+ days ago • Promoted
    Lead ML Infrastructure Engineer

    Lead ML Infrastructure Engineer

    ESRhealthcare • San Francisco, California, United States
    Full-time
    An innovative healthcare technology company in San Francisco is seeking a Machine Learning Engineer to enhance their ML infrastructure. In this role, you will design, implement, and maintain scalabl...Show more
    Last updated: 1 day ago • Promoted
    AI Infrastructure Engineer, Model Serving Platform

    AI Infrastructure Engineer, Model Serving Platform

    Scale AI • San Francisco, CA, United States
    Full-time
    As a software engineer on the ML Infrastructure team, you will work on developing the platform for orchestrating post-training and model evaluation jobs. At Scale, we are constantly developing new d...Show more
    Last updated: 17 days ago • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Abridge • San Francisco, CA, United States
    Full-time
    Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer.Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare.Our AI‑powered platform...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Engineer : ML Infra and Model Optimization

    Machine Learning Engineer : ML Infra and Model Optimization

    Genies • San Francisco, CA, United States
    Full-time
    Genies is an avatar technology company powering the next era of interactive digital identity through AI companions.With the Avatar Framework and intuitive creation tools, Genies enables developers,...Show more
    Last updated: 17 days ago • Promoted
    ML Infrastructure Engineer – Large-Scale Training (Relocation)

    ML Infrastructure Engineer – Large-Scale Training (Relocation)

    G2M Talent • San Francisco, CA, United States
    Full-time
    A tech-focused research team is searching for a Machine Learning Engineer to develop the infrastructure for large-scale training and experimentation of neural networks. The ideal candidate will desi...Show more
    Last updated: 24 days ago • Promoted
    AIML - Sr. Machine Learning Infrastructure Engineer, Evaluation

    AIML - Sr. Machine Learning Infrastructure Engineer, Evaluation

    Apple • San Francisco, CA, United States
    Full-time
    How do we ensure that Apple's most advanced AI features perform flawlessly for everyone, everywhere? At Apple, the AI / ML Evaluation team answers this question. We are the architects of quality and t...Show more
    Last updated: 3 days ago • Promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    IntelliPro Group Inc. • San Francisco, CA, US
    Full-time
    Quick Apply
    Machine Learning Engineer, Training Infrastructure Position Type : Full time Location : San Francisco, CA, USA Salary Range : $150,000 - $250, 000 (USD) Job ID# : 158135 Job Description : We are l...Show more
    Last updated: 30+ days ago
    Senior ML Infrastructure Engineer (Scale‑out AI Systems)

    Senior ML Infrastructure Engineer (Scale‑out AI Systems)

    Salesforce, Inc. • San Francisco, CA, United States
    Full-time
    A leading AI CRM company in San Francisco is looking for a Software Engineer with expertise in ML engineering to design and deliver scalable generative AI services. The role requires a minimum of 6 ...Show more
    Last updated: 30+ days ago • Promoted
    ML Infrastructure Engineer - Scale AI for Drug Discovery

    ML Infrastructure Engineer - Scale AI for Drug Discovery

    ESR Healthcare • San Francisco, CA, United States
    Full-time
    A healthcare technology firm in San Francisco is seeking a Machine Learning Engineer to enhance drug discovery processes using innovative machine learning techniques. You will design and maintain re...Show more
    Last updated: 11 days ago • Promoted
    Principal Machine Learning Infrastructure Engineer, Ads

    Principal Machine Learning Infrastructure Engineer, Ads

    Roblox • San Mateo, CA, United States
    Full-time
    Principal Machine Learning Infrastructure Engineer, Ads.Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features. Every day, tens of millions of people come to ...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Infrastructure Engineer (Lead / Staff))

    Machine Learning Infrastructure Engineer (Lead / Staff))

    Strativ Group • San Francisco, CA, United States
    Full-time
    A stealth AI research lab building.Machine Learning Infrastructure Lead.The lab is already executing on.This is an opportunity to take ownership of the systems that power advanced AI models from tr...Show more
    Last updated: 8 days ago • Promoted
    ML Infrastructure Engineer - Build Scalable AI Platforms

    ML Infrastructure Engineer - Build Scalable AI Platforms

    Delphina • San Francisco, CA, United States
    Full-time
    A technology company in San Francisco is seeking an experienced ML Infrastructure Engineer to develop platforms for machine learning jobs and to lead cross-functional initiatives.The ideal candidat...Show more
    Last updated: 11 days ago • Promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    Ipro Networks Pte. Ltd. • San Francisco, CA, United States
    Full-time
    Job Title : Machine Learning Engineer, Training Infrastructure | Position Type : Full time | Location : San Francisco, CA, USA | Salary Range : $150,000 - $250,000 (USD) | Job ID# : 158135.Design, imple...Show more
    Last updated: 30+ days ago • Promoted
    AIML - Sr. Machine Learning Infrastructure Engineer, Evaluation

    AIML - Sr. Machine Learning Infrastructure Engineer, Evaluation

    Apple Inc. • San Francisco, CA, United States
    Full-time
    Machine Learning Infrastructure Engineer, Evaluation.San Francisco, California, United States Software and Services.How do we ensure that Apple's most advanced AI features perform flawlessly for ev...Show more
    Last updated: 30+ days ago • Promoted