About us
Our mission is to build causal intelligence, starting with physics models to predict and control the weather.
We're building a small team driven by a deep passion and urgency to solve this civilizationally important problem.
Our founding team has led & shipped models across self‑driving cars, humanoid robotics, protein folding, and video generation at world‑class institutions including Google DeepMind, Cruise, Waymo, Meta, Nabla Bio, and Apple.
Responsibilities
- Design, deploy, and maintain large distributed ML training and inference clusters
- Develop efficient, scalable end‑to‑end pipelines to manage petabyte‑scale datasets and model training throughout the entire ML lifecycle
- Research and test various training approaches including parallelization techniques and numerical precision trade‑offs across different model scales
- Analyze, profile and debug low‑level GPU operations to optimize performance
- Stay up‑to‑date on research to bring new ideas to work
What we’re looking for
We value a relentless approach to problem‑solving, rapid execution, and the ability to quickly learn in unfamiliar domains.
Strong grasp of state‑of‑the‑art techniques for optimizing training and inference workloadsDemonstrated proficiency with distributed training frameworks (e.g. FSDP, DeepSpeed) to train large foundation modelsKnowledge of cloud platforms (GCP, AWS, or Azure) and their ML / AI service offeringsFamiliarity with containerization and orchestration frameworks (e.g., Kubernetes, Docker)Background working on distributed task management systems and scalable model serving & deployment architecturesUnderstanding of monitoring, logging, observability, and version control best practices for ML systemsYou don’t have to meet every single requirement above.
Benefits
Work on deeply challenging, unsolved problemsCompetitive cash and equity compensationMedical, dental, and vision insuranceCatered lunch & dinnerUnlimited paid time offVisa sponsorship & relocation support#J-18808-Ljbffr