Head of ML Cloud PlatformUniversalAGI • San Francisco, CA, US

Head of ML Cloud Platform

UniversalAGI • San Francisco, CA, US

1 day ago

Job type

Full-time

Job description

San Francisco | Work Directly with CEO & founding team | Report to CEO | OpenAI for Physics | 5 Days Onsite

Head of ML Cloud Platform

San Francisco | Work Directly with CEO & Founding Team | Report to CEO | OpenAI for Physics | 5 Days Onsite

Location : Onsite in San Francisco

Compensation : Competitive Salary Significant Equity

Who We Are

UniversalAGI is building OpenAI for Physics. AI startup based in San Francisco and backed by Elad Gil (#1 Solo VC), Eric Schmidt (former Google CEO), Prith Banerjee (ANSYS CTO), Ion Stoica (Databricks Founder), Jared Kushner (former Senior Advisor to the President), David Patterson (Turing Award Winner), and Luis Videgaray (former Foreign and Finance Minister of Mexico). We're building foundation AI models for physics that enable end-to-end industrial automation from initial design through optimization, validation, and production.

We're building a high-velocity team of relentless researchers and engineers that will define the next generation of AI for industrial engineering. If you're passionate about AI, physics, or the future of industrial innovation, we want to hear from you.

About the Role

As the Head of ML Cloud Platform, you'll be in the arena from day one, building and leading the team that creates the backbone for AI-powered physics simulation at scale. This is your chance to own the entire ML infrastructure vision—from training foundation models on petabytes of CFD data to deploying them into mission-critical automotive and maritime production environments.

You'll work directly with the CEO and founding team to build a world-class ML platform organization, recruiting exceptional engineers and researchers while remaining deeply technical yourself. You'll architect systems that train models faster, serve predictions with lower latency, and integrate seamlessly into customers' existing CAE workflows—all while managing a team that ships with the velocity of a startup and the rigor of enterprise infrastructure.

This isn't a pure management role. You're a technical leader who codes, debugs production incidents at 2 AM when needed, and earns respect through hands-on contribution while simultaneously building the team and culture that will scale our platform to serve the world's largest industrial companies.

What You'll Do

Technical Leadership & Architecture

Define the ML platform vision : Architect the end-to-end infrastructure strategy for training, fine-tuning, serving, and deploying foundation models for physics simulation across cloud and on-premise environments

Build for scale and reliability : Design systems that can handle petabyte-scale CFD datasets, multi-day distributed training runs, and real-time inference for customers making million-dollar engineering decisions

Stay hands-on : Write code, debug critical production issues, review pull requests, and make key architectural decisions yourself—you're a technical leader who leads by doing

Bridge research and production : Translate cutting-edge research from our deep learning team into production-grade infrastructure that customers can depend on

Integrate with CAE ecosystems : Ensure our platform works seamlessly with existing simulation tools (Ansys, OpenFOAM, STAR-CCM ), HPC clusters, PLM systems, and enterprise security requirements

Team Building & Management

Recruit world-class talent : Build a team of exceptional ML infrastructure engineers, cloud platform engineers, and MLOps specialists who can execute at the highest level

Develop and mentor : Coach engineers to grow technically and professionally, fostering a culture of deep work, technical excellence, and customer obsession

Scale the organization : Grow the team from founding engineers to a robust platform organization as we scale from early customers to enterprise deployments

Set technical standards : Establish engineering practices, code review processes, and quality bars that enable the team to ship fast without breaking things

Foster collaboration : Work closely with deep learning researchers, product engineers, CFD domain experts, and customer success to ensure platform capabilities align with company needs

Execution & Delivery

Ship relentlessly : Drive the team to deliver infrastructure from prototype to production in weeks, not quarters, iterating based on real customer feedback

Own reliability : Take responsibility for platform uptime, performance, and customer success—when things break, you're in the arena fixing them

Make strategic tradeoffs : Balance innovation with stability, speed with quality, and custom solutions with scalable platforms

Work with customers : Engage directly with automotive and maritime customers to understand their infrastructure requirements, security constraints, and deployment challenges

Build for enterprise : Implement security, compliance, monitoring, and operational practices that meet the standards of Fortune 500 companies

Qualifications

Required Experience

8 years in ML infrastructure or cloud platform engineering, with at least 3 years in technical leadership roles managing high-performing teams

Proven track record building and scaling ML platforms for training, serving, or deploying models in production environments, ideally at AI-first companies

Deep technical expertise in distributed training (PyTorch Distributed, DeepSpeed, Ray), cloud infrastructure (AWS / GCP / Azure), and container orchestration (Kubernetes, Docker)

Hands-on coding ability : Expert-level Python and infrastructure-as-code skills—you can still ship production code yourself and review your team's work deeply

Team building success : Track record of recruiting, developing, and retaining exceptional engineering talent, with experience building teams from 3-4 engineers to 15-20

Strong product and customer intuition : Experience working closely with customers, understanding their workflows, and translating requirements into technical solutions

Outstanding execution velocity : Proven ability to ship infrastructure rapidly in fast-paced, high-growth environments while maintaining quality

Technical Requirements

ML infrastructure mastery : Deep understanding of training pipelines, model serving, distributed systems, GPU optimization, and the full ML lifecycle

Cloud platform expertise : Strong experience with cloud providers, infrastructure-as-code tools, and building hybrid cloud / on-premise solutions

System design excellence : Can architect complex, scalable systems and make smart tradeoff decisions under uncertainty

Performance optimization : Knowledge of GPU programming, model optimization techniques, and infrastructure cost management

Enterprise infrastructure : Experience with security, compliance, SSO, RBAC, and deploying into regulated or air-gapped environments

Leadership & Communication

Technical credibility : Earns respect through deep technical contribution, not just title or tenure

Clear communicator : Can explain complex technical decisions to customers, executives, researchers, and engineers at all levels

Strategic thinker : Balances short-term execution with long-term platform vision and architectural decisions

Player-coach mentality : Comfortable coding and debugging yourself while also managing, mentoring, and growing a team

High agency : Takes ownership of outcomes, doesn't wait for permission, and drives solutions to completion

Bonus Qualifications

Experience in industrial or scientific ML : Built infrastructure for physics simulation, computational chemistry, drug discovery, or other scientific computing domains

CAE / HPC background : Familiarity with simulation software, job schedulers (SLURM, PBS), parallel file systems, or high-performance computing environments

Founded or led platform teams at AI startups (Seed to Series B) through rapid growth and scaling challenges

Published or presented on ML infrastructure, distributed training, or MLOps topics at major conferences or venues

Experience with foundation models : Built infrastructure for training or serving large-scale pretrained models (LLMs, vision models, multimodal models)

Open-source contributions to major ML infrastructure projects (PyTorch, Ray, Kubernetes, MLflow, etc.)

PhD or MS in Computer Science, ML, or related field (or equivalent industry experience)

Enterprise B2B experience : Sold to or deployed infrastructure for Fortune 500 customers with complex security and compliance requirements

Cultural Fit

Technical Respect : Ability to earn respect through hands-on technical contribution, not just management authority

Intensity : Thrives in our unusually intense culture—willing to grind when needed and expects the same from your team

Customer Obsession : Passionate about solving real customer problems and building infrastructure that enables their success

Deep Work : Values long, uninterrupted periods of focused work and fosters this culture in your team

High Availability : Ready to be deeply involved whenever critical issues arise, whether that's at 2 AM or on weekends

Communication : Can translate complex technical concepts to diverse audiences and bridge engineering, research, and business

Growth Mindset : Embraces continuous learning and develops this mindset in your team

Startup Mindset : Comfortable with ambiguity, rapid change, and wearing multiple hats—you're a builder first, manager second

Work Ethic : Willing to put in the extra hours when needed to hit critical milestones and holds your team to high standards

Low Ego, High Accountability : Collaborative leadership style with focus on outcomes over personal credit

What We Offer

Build the foundation : Shape the ML platform strategy for a rapidly growing foundational AI company from the ground up

Real-world impact : See your infrastructure power physics simulations that optimize automotive aerodynamics, maritime vessel design, and other critical engineering applications

Direct CEO collaboration : Work closely with the founder & CEO, influence company strategy, and have your voice heard on major decisions

Exceptional team : Recruit and work with world-class deep learning researchers, CFD experts, and infrastructure engineers

Competitive compensation : Base salary significant equity upside as a founding leadership hire

In-person culture : 5 days a week in office with a team that values face-to-face collaboration, deep technical discussions, and building together

World-class network : Access to our investors and advisors including Eric Schmidt, Elad Gil, Ion Stoica, David Patterson, and others

Benefits

Competitive compensation and equity

Competitive health, dental, vision benefits paid by the company

401(k) plan offering

Flexible vacation

Team Building & Fun Activities

Great scope, ownership and impact

AI tools stipend

Monthly commute stipend

Monthly wellness / fitness stipend

Daily office lunch & dinner covered by the company

Immigration support

How We're Different

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again… who at the best knows in the end the triumph of high achievement, and who at the worst, if he fails, at least fails while daring greatly." – Teddy Roosevelt

At our core, we believe in being "in the arena." We are builders, problem solvers, and risk-takers who show up every day ready to put in the work : to sweat, to struggle, and to push past our limits. We know that real progress comes with missteps, iteration, and resilience. We embrace that journey fully knowing that daring greatly is the only way to create something truly meaningful.

If you're ready to build the ML platform that will revolutionize physics simulation, lead a world-class team, and deliver transformative impact to industrial engineering, UniversalAGI is the place for you.

J-18808-Ljbffr

Create a job alert for this search

Head Of Platform • San Francisco, CA, US

Related jobs

Head of Claude Developer Platform

Anthropic • San Francisco, CA, United States

Full-time

A technology company headquartered in San Francisco seeks a Head of Product for the Claude Developer Platform, focusing on strategic leadership, developer experience, and scaling API products.Ideal...Show more

Last updated: 6 days ago • Promoted

Head of AI Platform & Ecosystem Partnerships

Crusoe Energy Systems LLC • San Francisco, CA, US

Full-time

Head of AI Platform & Ecosystem Partnerships About the Job Crusoe's mission is to accelerate the abundance of energy and intelligence. We're crafting the engine that powers a world where people ...Show more

Last updated: 4 days ago • Promoted

Global Head, GSI Alliances — Cloud Platforms & Open Source

MBR Partners • San Francisco, California, United States

Full-time

A leading software company in San Francisco is seeking a Global Head of GSI Alliance to foster partnerships and expand market share. This role requires in-depth knowledge of cloud technologies and o...Show more

Last updated: 1 day ago • Promoted

Head of Enterprise AI Deployment & Scale

Menlo Ventures • San Francisco, CA, United States

Full-time

A leading AI research firm in San Francisco is seeking a seasoned leader to build and scale their Forward Deployment Engineering organization. You will lead a talented team to drive AI adoption with...Show more

Last updated: 13 days ago • Promoted

Director, Platform Engineering — AI Infrastructure & Scalable ML

Weights & Biases • San Francisco, CA, United States

Full-time

A leading AI technology firm in San Francisco is seeking a Director of Engineering to lead mission-critical teams in platform engineering. The successful candidate will have over 10 years of softwar...Show more

Last updated: 7 days ago • Promoted

Head of Data Platform & AI Cloud Architecture

Samba TV, Inc. • San Francisco, CA, United States

Full-time

A leading analytics firm located in San Francisco is seeking a visionary Senior Director to lead the transformation of its data platform and cloud services architecture. The ideal candidate will hav...Show more

Last updated: 5 days ago • Promoted

Head of Infrastructure

Metronome Technologies, Inc. • San Francisco, CA, United States

Full-time

Metronome is the leading usage-based billing platform built for modern software companies.With Metronome, companies can launch products faster, offer any pricing model, and streamline finance workf...Show more

Last updated: 30+ days ago • Promoted

Head of DevOps

Confidential • San Francisco, CA, United States

Full-time

Promising provider of AI-powered secure data classification solutions.Information Technology and Services.The Company is seeking a Head of DevOps to spearhead deployment initiatives within secure g...Show more

Last updated: 30+ days ago • Promoted

Head of Platform / AI Cluster Management - System Integrator

Hamilton Barnes Associates Limited • San Francisco, CA, United States

Full-time

Ready to lead innovation at the intersection of platforms and artificial intelligence?.Join a pioneering technology company driving advancements in cloud, AI, and data-driven solutions across globa...Show more

Last updated: 30+ days ago • Promoted

Head of AI Platform & Generative Models

Jiffyshirts • San Francisco, CA, United States

Full-time

A fast-growing tech startup is looking for a deeply technical Product Manager to lead the vision and strategy of their foundational AI platform. Candidates should have at least 6 years of product ma...Show more

Last updated: 8 days ago • Promoted

Head of ML Cloud Platform

UniversalAGI • San Francisco, California, United States

Full-time

Last updated: 1 day ago • Promoted

Global Head of Cloud Alliances

Canonical • San Francisco, CA, United States

Full-time

Global Head of Cloud Alliances.Be among the first 25 applicants.Global Head of Cloud Alliances.Get AI-powered advice on this job and more exclusive features. Canonical is a leading provider of open ...Show more

Last updated: 30+ days ago • Promoted

Head of Enterprise IAM & Platform Readiness

Harvey • San Francisco, California, United States

Full-time

A leading technology firm in San Francisco is seeking an experienced product leader focused on enterprise readiness to shape identity and access management strategies. The role requires strong techn...Show more

Last updated: 4 days ago • Promoted

Head of AI Deployments

AssembledHQ, Inc • San Francisco, CA, United States

Full-time

Assembled builds the infrastructure that underpins exceptional customer support, empowering companies like CashApp, Etsy, and Robinhood to deliver faster, better service at scale.With solutions for...Show more

Last updated: 11 days ago • Promoted

Chief Enterprise Architect (Data Center Multi-Cloud)

Computacenter2024 • San Francisco, California, USA

Full-time

At Computacenter we pride ourselves on fostering a culture that emphasizes diversity inclusivity and collaboration.We are committed to building supportive and rewarding relationships celebrating su...Show more

Last updated: 22 days ago • Promoted

Head of Platform Engineering

Descript • San Francisco, CA, United States

Full-time

Our vision at Descript is to build the next-generation platform for fast and easy creation of audio and video content.We are trusted by some of the world's top podcasters and influencers, as well a...Show more

Last updated: 30+ days ago • Promoted

Google Cloud Platform Architect

West Advanced Technologies • Oakland, CA, United States

Full-time

Google Cloud Platform Architect.Minimum of eight (8) years of experience aligning IT systems with organizational business processes. At least five (5) years of that experience must have been in a le...Show more

Last updated: 21 days ago • Promoted

Global Head of Cloud Applications Services

Rackspace, Inc. • San Francisco, California, United States

Full-time

A leading multicloud solutions provider is seeking a Global Head of Application Services.This leadership role involves managing the Application Services Business globally, focusing on driving busin...Show more

Last updated: 4 days ago • Promoted