No longer accepting applications

Staff ML Infrastructure Engineer

Cubiq RecruitmentSunnyvale, CA, US

2 days ago

Job type

Full-time

Job description

Job Description

Staff / Lead ML Infrastructure Engineer

San Francisco, CA — Onsite

Salary - Over market average + equity

We are building one of the world’s leading generative video and multimodal AI platforms, and we’re looking for a senior infrastructure engineer to drive the backbone that makes it possible. This role is ideal for an engineer from a top-tier tech company who has built cloud-scale systems, high-performance compute platforms, and battle-tested CI / CD pipelines that support complex ML workloads.

What You’ll Own

Core ML Platform Architecture : Design and evolve the infrastructure that supports large-scale generative video and multimodal model training, evaluation, and deployment.
High-Throughput Compute Systems : Build and optimize GPU / TPU clusters, distributed training systems, and orchestration layers tailored for video-heavy pipelines.
Production Reliability for Generative Models : Create the tooling and services needed to safely push frequent model updates while handling massive compute loads and long-running jobs.
End-to-End CI / CD for ML : Lead the development of automated pipelines for model training, validation, artifact management, and production rollout.
Multimodal Data Infrastructure : Build systems to ingest, version, transform, and serve large-scale video, audio, and text datasets with high reliability.
Internal Developer Experience : Partner with research, product, and applied ML teams to build intuitive internal tooling for experiment tracking, model lineage, and resource scheduling.
Technical Leadership : Mentor engineers, set platform standards, and influence long-term architectural direction.

What You’ve Done

Experience architecting and operating large-scale infrastructure at a cloud provider, hyperscaler, or leading AI company.

Built or owned mission-critical CI / CD systems, high-capacity compute platforms, or data infrastructure supporting ML teams.

Deep experience with distributed compute across GPUs / accelerators, Kubernetes, and cloud infrastructure (AWS / GCP / Azure).

Strong engineering fundamentals in Python, Go, or equivalent languages.

Previous exposure to ML training pipelines—especially systems that handle heavy video, multimodal, or high-dimensional data.

Demonstrated ability to lead complex cross-org initiatives and drive technical strategy.

Nice to Have

Experience with video processing systems, large-scale media pipelines, or streaming architectures.

Familiarity with modern multimodal or video-generation frameworks (PyTorch, JAX, diffusers, custom accelerators).

Experience with Ray, Triton, CUDA optimization, or specialized scheduling for ML workloads.

Background working in high-growth AI startups or research-focused environments.

Security and compliance considerations for models that generate or process user content.

Why Join

Shape the underlying platform powering one of the most advanced generative video systems in the world.

Influence the future of multimodal AI by building infrastructure that directly accelerates research and product breakthroughs.

Work closely with experienced founding engineers, researchers, and platform builders from leading tech companies.

Highly competitive compensation, meaningful equity, and strong in-person engineering culture in San Francisco.

Create a job alert for this search

Staff Engineer Infrastructure • Sunnyvale, CA, US