Talent.com
Staff ML Infrastructure Engineer
Staff ML Infrastructure EngineerCubiq Recruitment • Hayward, CA, US
Staff ML Infrastructure Engineer

Staff ML Infrastructure Engineer

Cubiq Recruitment • Hayward, CA, US
15 days ago
Job type
  • Full-time
Job description

Staff / Lead ML Infrastructure Engineer

San Francisco, CA — Onsite

Salary - Over market average + equity

We are building one of the world's leading generative video and multimodal AI platforms, and we're looking for a senior infrastructure engineer to drive the backbone that makes it possible. This role is ideal for an engineer from a top-tier tech company who has built cloud-scale systems, high-performance compute platforms, and battle-tested CI / CD pipelines that support complex ML workloads.

What You'll Own

  • Core ML Platform Architecture : Design and evolve the infrastructure that supports large-scale generative video and multimodal model training, evaluation, and deployment.
  • High-Throughput Compute Systems : Build and optimize GPU / TPU clusters, distributed training systems, and orchestration layers tailored for video-heavy pipelines.
  • Production Reliability for Generative Models : Create the tooling and services needed to safely push frequent model updates while handling massive compute loads and long-running jobs.
  • End-to-End CI / CD for ML : Lead the development of automated pipelines for model training, validation, artifact management, and production rollout.
  • Multimodal Data Infrastructure : Build systems to ingest, version, transform, and serve large-scale video, audio, and text datasets with high reliability.
  • Internal Developer Experience : Partner with research, product, and applied ML teams to build intuitive internal tooling for experiment tracking, model lineage, and resource scheduling.
  • Technical Leadership : Mentor engineers, set platform standards, and influence long-term architectural direction.

What You've Done

  • Experience architecting and operating large-scale infrastructure at a cloud provider, hyperscaler, or leading AI company.
  • Built or owned mission-critical CI / CD systems, high-capacity compute platforms, or data infrastructure supporting ML teams.
  • Deep experience with distributed compute across GPUs / accelerators, Kubernetes, and cloud infrastructure (AWS / GCP / Azure).
  • Strong engineering fundamentals in Python, Go, or equivalent languages.
  • Previous exposure to ML training pipelines—especially systems that handle heavy video, multimodal, or high-dimensional data.
  • Demonstrated ability to lead complex cross-org initiatives and drive technical strategy.
  • Nice to Have

  • Experience with video processing systems, large-scale media pipelines, or streaming architectures.
  • Familiarity with modern multimodal or video-generation frameworks (PyTorch, JAX, diffusers, custom accelerators).
  • Experience with Ray, Triton, CUDA optimization, or specialized scheduling for ML workloads.
  • Background working in high-growth AI startups or research-focused environments.
  • Security and compliance considerations for models that generate or process user content.
  • Why Join

  • Shape the underlying platform powering one of the most advanced generative video systems in the world.
  • Influence the future of multimodal AI by building infrastructure that directly accelerates research and product breakthroughs.
  • Work closely with experienced founding engineers, researchers, and platform builders from leading tech companies.
  • Highly competitive compensation, meaningful equity, and strong in-person engineering culture in San Francisco.
  • Create a job alert for this search

    Staff Engineer Infrastructure • Hayward, CA, US

    Related jobs
    Staff Cloud Infrastructure Engineer

    Staff Cloud Infrastructure Engineer

    Zscaler • San Jose, California, United States
    Remote
    Full-time
    Zscaler accelerates digital transformation so our customers can be more agile, efficient, resilient, and secure.Our cloud native Zero Trust Exchange platform protects thousands of customers from cy...Show more
    Last updated: 8 days ago • Promoted
    Sr. Staff ML Platform Engineer (TLM)

    Sr. Staff ML Platform Engineer (TLM)

    Earnin • Mountain View, California, United States
    Full-time
    As one of the first pioneers of earned wage access, our passion at EarnIn is building products that deliver real-time financial flexibility for those with the unique needs of living paycheck to pay...Show more
    Last updated: 30+ days ago • Promoted
    RAN Infrastructure Engineer

    RAN Infrastructure Engineer

    Skylo Technologies • Mountain View, California, United States
    Full-time
    Skylo is a global Non-Terrestrial Network service provider based in Mountain View, CA, offering a service that allows smartphone and IoT cellular devices to connect directly over existing satellite...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Staff Software Engineer Network Infrastructure Observability

    Sr. Staff Software Engineer Network Infrastructure Observability

    LinkedIn • Mountain View, California, USA
    Full-time
    At LinkedIn our approach to flexible work is centered on trust and optimized for culture connection clarity and the evolving needs of our business. The work location of this role is hybrid meaning i...Show more
    Last updated: 15 days ago • Promoted
    Senior ML Platform Engineer : Scale LLM Infrastructure

    Senior ML Platform Engineer : Scale LLM Infrastructure

    GEICO • Palo Alto, CA, United States
    Full-time
    A leading insurance company in California is seeking a Senior ML Platform Engineer to enhance their machine learning infrastructure. This role involves designing scalable systems for Large Language ...Show more
    Last updated: 8 days ago • Promoted
    Staff Thermal Engineer

    Staff Thermal Engineer

    Supermicro • San Jose, CA, United States
    Full-time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show more
    Last updated: 16 days ago • Promoted
    Flight Software Infrastructure Engineer

    Flight Software Infrastructure Engineer

    Reliable Robotics • Mountain View, CA, United States
    Permanent
    We're building safety-enhancing technology for aviation that will save lives.Automated aviation systems will enable a future where air transportation is safer, more convenient and fundamentally tra...Show more
    Last updated: 30+ days ago • Promoted
    Staff Infrastructure / DevOps Engineer

    Staff Infrastructure / DevOps Engineer

    Gatik Ai • Mountain View, California, United States
    Full-time
    Gatik, the leader in autonomous middle-mile logistics, is revolutionizing the B2B supply chain with its autonomous transportation-as-a-service (ATaaS) solution and prioritizing safe, consistent del...Show more
    Last updated: 30+ days ago • Promoted
    Infrastructure Engineer

    Infrastructure Engineer

    DTEX • Fremont, California, United States
    Full-time
    We are excited that you’ve taken the time to explore our business and potentially join us on this incredible journey.We are already the leader in the Insider Risk Management, but our story doesn’t ...Show more
    Last updated: 13 hours ago • Promoted • New!
    Staff Systems Engineer

    Staff Systems Engineer

    Bio-Rad Laboratories • Pleasanton, CA, United States
    Full-time
    Working within Bio-Rad's Life Science R&D Group as a Systems Engineer, you will take engineering concepts, requirements and transform them into functional prototypes and finished products that impr...Show more
    Last updated: 30+ days ago • Promoted
    Product Infrastructure Engineer - Site Reliability

    Product Infrastructure Engineer - Site Reliability

    Zyphra • Palo Alto, California, United States
    Full-time
    Infrastructure Engineer - Site Reliability.Your work will be essential to ensuring the reliability and reproducibility of ML workloads, the safety and control of deployments, and the long-term main...Show more
    Last updated: 30+ days ago • Promoted
    ML Infrastructure Engineer — Scale Generative Models

    ML Infrastructure Engineer — Scale Generative Models

    Apple Inc. • Cupertino, CA, United States
    Full-time
    A leading technology company in Cupertino, California, is seeking a ML Infrastructure Engineer to design and optimize the systems that power large-scale model training. The ideal candidate will have...Show more
    Last updated: 10 days ago • Promoted
    Senior Software Engineer - ML Infrastructure

    Senior Software Engineer - ML Infrastructure

    Applied Intuition • Sunnyvale, CA, United States
    Full-time
    Applied Intuition is the vehicle intelligence company that accelerates the global adoption of safe, AI-driven machines.Founded in 2017 and now valued at $15 billion following its recent Series F fu...Show more
    Last updated: 11 days ago • Promoted
    Staff ML Engineer, Cross‑Team Recommendations

    Staff ML Engineer, Cross‑Team Recommendations

    Pinterest • Palo Alto, California, United States
    Full-time
    A leading visual discovery platform is seeking a highly motivated Staff ML Engineer to work as a cross-team technical leader. This role involves innovating on large-scale machine learning recommenda...Show more
    Last updated: 4 days ago • Promoted
    Software Engineer, Ads ML Infrastructure

    Software Engineer, Ads ML Infrastructure

    Tik Tok • San Jose, CA, United States
    Full-time
    About the team The ads system at TikTok operates on a massive scale and serves millions of advertisers, clients and influencers across the world. The quality of the ads system highly depends on the ...Show more
    Last updated: 30+ days ago • Promoted
    Staff ML Engineer - Infrastructure

    Staff ML Engineer - Infrastructure

    ChipStack • San Jose, California, United States
    Full-time
    Chips are at the center of today's tech-driven world.But how we design them has not changed in decades, while their complexity and specialization have skyrocketed due to increasing performance dema...Show more
    Last updated: 30+ days ago • Promoted
    Staff Systems Engineer

    Staff Systems Engineer

    Intuitive • Sunnyvale, California, USA
    Full-time
    We are seeking a highly experienced Staff Engineer in Infrastructure to contribute to the strategy architecture and operations of Infrastructure as Code (IaC) for the Technical Operations group (Az...Show more
    Last updated: 9 days ago • Promoted
    Senior Infra Engineer - Gemini API+ Serving & ML

    Senior Infra Engineer - Gemini API+ Serving & ML

    Google Inc. • Sunnyvale, CA, United States
    Full-time
    A leading technology company in Sunnyvale is seeking a Senior Software Engineer to develop infrastructure for AI applications. You will collaborate on machine learning projects and ensure robust, sc...Show more
    Last updated: 7 days ago • Promoted