Talent.com
Senior Infrastructure Engineer - Supercomputing

Senior Infrastructure Engineer - Supercomputing

Institute Of Foundation ModelsSunnyvale, California, United States
30+ days ago
Job type
  • Full-time
Job description

About the Institute of Foundation Models

We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.

The Role

We are operating some of the world’s largest GPU supercomputing clusters to support cutting-edge AI research and large-scale model deployment. We’re looking for an Infrastructure Engineer to join our core platform team to help build, operate, and scale our hybrid infrastructure across both on-prem and cloud environments.

This role is ideal for engineers who thrive at the intersection of distributed systems, cloud automation, and high-performance computing.

Key Responsibilities

  • Operate and scale high-performance GPU clusters used for AI training and production inference.
  • Manage infrastructure across on-premise (Slurm-based) HPC environments and cloud providers like AWS and Azure .
  • Implement and maintain Infrastructure as Code using Pulumi , Terraform , or Ansible .
  • Enhance and secure deployment pipelines using Kubernetes , Flux , and ArgoCD .
  • Help define and enforce security best practices for internal researchers and production services.
  • Continuously improve observability, resiliency, and operational tooling across environments.

Tech Stack

  • Kubernetes, Slurm
  • Pulumi, Terraform, Ansible
  • Rust and Go
  • Flux, ArgoCD
  • AWS, Azure
  • Professional Experience

  • Strong experience managing compute infrastructure in hybrid environments (on-prem and cloud).
  • Hands-on experience operating Slurm clusters at scale.
  • Proficiency in deploying and managing containerized applications, ideally written in Rust or Go .
  • Solid background in IaC and CI / CD best practices.
  • Experience working with GPU workloads or HPC infrastructure is a strong plus.
  • Familiarity with securing and monitoring multi-tenant compute environments.
  • $200,000 - $400,000 a year

    Salary depends on level.

    Visa Sponsorship

    This position is eligible for visa sponsorship.

    Benefits Include

  • Comprehensive medical, dental, and vision benefits
  • Bonus
  • 401K Plan
  • Generous paid time off, sick leave and holidays
  • Paid Parental Leave
  • Employee Assistance Program
  • Life insurance and disability
  • Create a job alert for this search

    Senior Infrastructure Engineer • Sunnyvale, California, United States

    Related jobs
    • Promoted
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    Sinclair Talent SolutionsSan Francisco, CA, United States
    Full-time
    This range is provided by Sinclair Talent Solutions.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. We are helping a rapidly scaling startup bui...Show moreLast updated: 4 days ago
    • Promoted
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    Crossing HurdlesSan Francisco, CA, United States
    Full-time
    This range is provided by Crossing Hurdles.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Crossing Hurdles is a global recruitment consultancy ...Show moreLast updated: 4 days ago
    • Promoted
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    Vast.aiSan Francisco, CA, United States
    Full-time
    AI projects and businesses all over the world.We are democratizing and decentralizing AI computing—reshaping our future for the benefit of humanity. On-site at our office in San Francisco or Westwoo...Show moreLast updated: 4 days ago
    • Promoted
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    Pump.coSan Francisco, CA, United States
    Full-time
    Cloud spend is a whopping $500 billion / yr, the biggest growing expense category for any tech company.Tackling these costs requires continuous effort and time from DevOps teams.Pump is building the ...Show moreLast updated: 1 day ago
    • Promoted
    Senior Infrastructure Engineer - Bellevue or San Francisco

    Senior Infrastructure Engineer - Bellevue or San Francisco

    AircallSan Francisco, CA, United States
    Full-time
    Aircall is the world’s leading integrated customer communications and intelligence platform for growing businesses.Trusted by over 20,000 companies worldwide, Aircall unifies voice and digital chan...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Cloud Infrastructure Engineer

    Senior Cloud Infrastructure Engineer

    OmniSanta Cruz, CA, United States
    Full-time
    Omni is a business intelligence and embedded analytics platform that helps customers improve self-service, accelerate AI adoption, and build customer-facing data products.Whether users prefer AI, s...Show moreLast updated: 4 days ago
    • Promoted
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    CommerceSan Francisco, CA, United States
    Full-time
    Senior Infrastructure Engineer.Direct message the job poster from Commerce.Welcome to the Agentic Commerce Era.At Commerce, our mission is to empower businesses to innovate, grow, and thrive with o...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Cloud Infrastructure Engineer

    Senior Cloud Infrastructure Engineer

    Harrison ClarkeSan Francisco, CA, United States
    Full-time
    Annual Bonus, Sign-on bonus, RSUs, and Stock options.Join a dynamic startup seeking an infrastructure specialist to design, scale, and maintain cutting-edge infrastructure that powers innovative di...Show moreLast updated: 4 days ago
    • Promoted
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    AimhireSan Francisco, CA, United States
    Full-time
    Senior Infrastructure Engineer.The ideal candidate is deeply technical, high‑agency, and thrives in fast‑moving environments. AI researchers to ensure reliability and performance.If you have hands‑o...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Cloud Infrastructure Engineer

    Senior Cloud Infrastructure Engineer

    The Recruiting GuySan Francisco, CA, United States
    Full-time
    If this role is still posted then we are still recruiting and needing applications.Senior Cloud Infrastructure Engineer.Must live within commuting distance of San Francisco or be willing to relocat...Show moreLast updated: 4 days ago
    • Promoted
    • New!
    Staff ML Infrastructure Engineer

    Staff ML Infrastructure Engineer

    Cubiq RecruitmentSan Jose, CA, US
    Full-time
    Staff / Lead ML Infrastructure Engineer.San Francisco, CA — Onsite.Salary - Over market average + equity.We are building one of the world’s leading generative video and multimodal AI pl...Show moreLast updated: 21 hours ago
    • Promoted
    • New!
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    DjangojobsSan Francisco, CA, United States
    Full-time
    SentiLink is building the future of identity verification in the United States.The existing ways to determine if somebody is who they claim to be are too clunky, ineffective, and expensive, but we ...Show moreLast updated: 7 hours ago
    • Promoted
    • New!
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    Hamilton Barnes?San Francisco, CA, United States
    Full-time
    Join a ground-breaking AI platform company, founded by top AI researchers and successful entrepreneurs, that is building the first Agentic AI Platform for SAP Custom Code.This critical infrastructu...Show moreLast updated: 7 hours ago
    • Promoted
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    Recruiting From ScratchSan Francisco, CA, United States
    Full-time
    Senior Infrastructure Engineer.Series A Venture-Backed Startup.Fully Remote (Option to work from NYC or Boston offices).Our client is a fast-growing, physician-founded healthtech startup on a missi...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Senior System Software Engineer - Infrastructure

    Senior System Software Engineer - Infrastructure

    Nvidia CorporationSanta Clara, CA, United States
    Full-time
    Today, we're tapping into the unlimited potential of AI to define the next era of computing.An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understan...Show moreLast updated: 14 hours ago
    • Promoted
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    Macroscope Inc.San Francisco, CA, United States
    Full-time
    Macroscope aims to be the source of truth of what's happening for any company that builds software.Our mission is to give leaders clarity and engineers time. We help leaders understand how their pro...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    CrusoeSan Francisco, CA, United States
    Full-time
    Crusoe's mission is to accelerate the abundance of energy and intelligence.We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, spe...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Senior Infrastructure Engineer - InfraOps

    Senior Infrastructure Engineer - InfraOps

    BitGo, Inc.San Francisco, CA, United States
    Full-time
    BitGo is the leading infrastructure provider of digital asset solutions, delivering custody, wallets, staking, trading, financing, and settlement services from regulated cold storage.Since our foun...Show moreLast updated: 7 hours ago