Talent.com
Machine Learning - Infrastructure
Machine Learning - InfrastructureCausal Labs • San Francisco, CA, United States
Machine Learning - Infrastructure

Machine Learning - Infrastructure

Causal Labs • San Francisco, CA, United States
5 days ago
Job type
  • Full-time
Job description

About us

Our mission is to build causal intelligence, starting with physics models to predict and control the weather.

We're building a small team driven by a deep passion and urgency to solve this civilizationally important problem.

Our founding team has led & shipped models across self‑driving cars, humanoid robotics, protein folding, and video generation at world‑class institutions including Google DeepMind, Cruise, Waymo, Meta, Nabla Bio, and Apple.

Responsibilities

  • Design, deploy, and maintain large distributed ML training and inference clusters
  • Develop efficient, scalable end‑to‑end pipelines to manage petabyte‑scale datasets and model training throughout the entire ML lifecycle
  • Research and test various training approaches including parallelization techniques and numerical precision trade‑offs across different model scales
  • Analyze, profile and debug low‑level GPU operations to optimize performance
  • Stay up‑to‑date on research to bring new ideas to work

What we’re looking for

We value a relentless approach to problem‑solving, rapid execution, and the ability to quickly learn in unfamiliar domains.

  • Strong grasp of state‑of‑the‑art techniques for optimizing training and inference workloads
  • Demonstrated proficiency with distributed training frameworks (e.g. FSDP, DeepSpeed) to train large foundation models
  • Knowledge of cloud platforms (GCP, AWS, or Azure) and their ML / AI service offerings
  • Familiarity with containerization and orchestration frameworks (e.g., Kubernetes, Docker)
  • Background working on distributed task management systems and scalable model serving & deployment architectures
  • Understanding of monitoring, logging, observability, and version control best practices for ML systems
  • You don’t have to meet every single requirement above.

    Benefits

  • Work on deeply challenging, unsolved problems
  • Competitive cash and equity compensation
  • Medical, dental, and vision insurance
  • Catered lunch & dinner
  • Unlimited paid time off
  • Visa sponsorship & relocation support
  • #J-18808-Ljbffr

    Create a job alert for this search

    Machine Learning • San Francisco, CA, United States

    Related jobs
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Greylock Partners • San Francisco, CA, United States
    Full-time
    Machine Learning Infrastructure Engineer — join early B2C investment to help build large-scale ML infrastructure for a cutting-edge AI-first mobile product. Founders have experience building iconic ...Show more
    Last updated: 30+ days ago • Promoted
    Principle Machine Learning Infrastructure Engineer, Ads

    Principle Machine Learning Infrastructure Engineer, Ads

    Roblox • San Mateo, California, United States
    Full-time
    With Roblox Ads business growing at a rapid rate, we are building large scale ads machine learning infrastructure to deliver effective performance ads to our users, and more business values to our ...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Abridge • San Francisco, CA, United States
    Full-time
    Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare.Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation eff...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer, Machine Learning Infrastructure

    Software Engineer, Machine Learning Infrastructure

    David AI • San Francisco, California, United States
    Full-time
    David AI is the first audio data research company.We bring an R&D approach to data–developing datasets with the same rigor AI labs bring to models. Our mission is to bring AI into the real world, an...Show more
    Last updated: 30+ days ago • Promoted
    AIML - Sr. Machine Learning Infrastructure Engineer, Evaluation

    AIML - Sr. Machine Learning Infrastructure Engineer, Evaluation

    Apple • San Francisco, CA, United States
    Full-time
    How do we ensure that Apple's most advanced AI features perform flawlessly for everyone, everywhere? At Apple, the AI / ML Evaluation team answers this question. We are the architects of quality and t...Show more
    Last updated: 13 days ago • Promoted
    Staff Machine Learning Infrastructure Engineer

    Staff Machine Learning Infrastructure Engineer

    DYNA Robotics Inc • Redwood City, CA, United States
    Full-time
    Dyna Robotics makes general-purpose robots powered by a proprietary embodied AI foundation model that generalizes and self-improves across varied environments with commercial-grade performance.Dyna...Show more
    Last updated: 16 days ago • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Saxon Global • Menlo Park, CA, United States
    Full-time
    Strong foundation in machine learning, deep learning, and computer vision.Experience with distributed systems and scalable ML infrastructure. Proficient in Python and software development best pract...Show more
    Last updated: 13 days ago • Promoted
    Machine Learning - Infrastructure

    Machine Learning - Infrastructure

    Causal Labs • San Francisco, CA, United States
    Full-time
    Our mission is to build causal intelligence, starting with physics models to predict and control the weather.We're building a small team driven by a deep passion and urgency to solve this civilizat...Show more
    Last updated: 5 days ago • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Ambience Healthcare • San Francisco, CA, United States
    Full-time
    Ambience Healthcare is the leading AI platform for documentation, coding, and clinical workflow, built to reduce administrative burden and protect revenue integrity at the point of care.Trusted by ...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    HEDRA INC • San Francisco, CA, United States
    Full-time
    Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    Hedra, Inc • San Francisco, CA, United States
    Full-time
    Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer, Machine Learning Infrastructure

    Software Engineer, Machine Learning Infrastructure

    Datologyai • Redwood City, California, United States
    Full-time
    Companies want to train their own large models on their own data.The current industry standard is to train on a random sample of your data, which is inefficient at best and actively harmful to mode...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Workshop Labs • San Francisco, California, United States
    Full-time
    Build the infrastructure to serve personal AI models privately and at scale.We're building the first truly private, personal AI – one that learns your skills, judgment, and preferences without big ...Show more
    Last updated: 10 days ago • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Character • Redwood City, CA, United States
    Full-time
    We're looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research. Provide infrastructure support to our ...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Character.AI • San Francisco, CA, United States
    Full-time
    Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer. Machine Learning Infrastructure Engineer.Get AI-powered advice on this job...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    Ipro Networks Pte. Ltd. • San Francisco, CA, United States
    Full-time
    Job Title : Machine Learning Engineer, Training Infrastructure | Position Type : Full time | Location : San Francisco, CA, USA | Salary Range : $150,000 - $250,000 (USD) | Job ID# : 158135.Design, imple...Show more
    Last updated: 30+ days ago • Promoted
    AIML - Sr. Machine Learning Infrastructure Engineer, Evaluation

    AIML - Sr. Machine Learning Infrastructure Engineer, Evaluation

    Apple Inc. • San Francisco, CA, United States
    Full-time
    Machine Learning Infrastructure Engineer, Evaluation.San Francisco, California, United States Software and Services.How do we ensure that Apple's most advanced AI features perform flawlessly for ev...Show more
    Last updated: 12 days ago • Promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    Hedra • San Francisco, CA, United States
    Full-time
    Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show more
    Last updated: 30+ days ago • Promoted