Talent.com
No longer accepting applications
Staff ML Infrastructure Engineer

Staff ML Infrastructure Engineer

Cubiq RecruitmentSunnyvale, CA, US
2 days ago
Job type
  • Full-time
Job description

Job Description

Staff / Lead ML Infrastructure Engineer

San Francisco, CA — Onsite

Salary - Over market average + equity

We are building one of the world’s leading generative video and multimodal AI platforms, and we’re looking for a senior infrastructure engineer to drive the backbone that makes it possible. This role is ideal for an engineer from a top-tier tech company who has built cloud-scale systems, high-performance compute platforms, and battle-tested CI / CD pipelines that support complex ML workloads.

What You’ll Own

  • Core ML Platform Architecture : Design and evolve the infrastructure that supports large-scale generative video and multimodal model training, evaluation, and deployment.
  • High-Throughput Compute Systems : Build and optimize GPU / TPU clusters, distributed training systems, and orchestration layers tailored for video-heavy pipelines.
  • Production Reliability for Generative Models : Create the tooling and services needed to safely push frequent model updates while handling massive compute loads and long-running jobs.
  • End-to-End CI / CD for ML : Lead the development of automated pipelines for model training, validation, artifact management, and production rollout.
  • Multimodal Data Infrastructure : Build systems to ingest, version, transform, and serve large-scale video, audio, and text datasets with high reliability.
  • Internal Developer Experience : Partner with research, product, and applied ML teams to build intuitive internal tooling for experiment tracking, model lineage, and resource scheduling.
  • Technical Leadership : Mentor engineers, set platform standards, and influence long-term architectural direction.

What You’ve Done

  • Experience architecting and operating large-scale infrastructure at a cloud provider, hyperscaler, or leading AI company.
  • Built or owned mission-critical CI / CD systems, high-capacity compute platforms, or data infrastructure supporting ML teams.
  • Deep experience with distributed compute across GPUs / accelerators, Kubernetes, and cloud infrastructure (AWS / GCP / Azure).
  • Strong engineering fundamentals in Python, Go, or equivalent languages.
  • Previous exposure to ML training pipelines—especially systems that handle heavy video, multimodal, or high-dimensional data.
  • Demonstrated ability to lead complex cross-org initiatives and drive technical strategy.
  • Nice to Have

  • Experience with video processing systems, large-scale media pipelines, or streaming architectures.
  • Familiarity with modern multimodal or video-generation frameworks (PyTorch, JAX, diffusers, custom accelerators).
  • Experience with Ray, Triton, CUDA optimization, or specialized scheduling for ML workloads.
  • Background working in high-growth AI startups or research-focused environments.
  • Security and compliance considerations for models that generate or process user content.
  • Why Join

  • Shape the underlying platform powering one of the most advanced generative video systems in the world.
  • Influence the future of multimodal AI by building infrastructure that directly accelerates research and product breakthroughs.
  • Work closely with experienced founding engineers, researchers, and platform builders from leading tech companies.
  • Highly competitive compensation, meaningful equity, and strong in-person engineering culture in San Francisco.
  • Create a job alert for this search

    Staff Engineer Infrastructure • Sunnyvale, CA, US