Talent.com
Senior Infrastructure Engineer - Supercomputing

Senior Infrastructure Engineer - Supercomputing

Institute of Foundation ModelsSunnyvale, CA, US
24 days ago
Job type
  • Full-time
Job description

Job Description

Job Description

About the Institute of Foundation Models

We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.

The Role

We are operating some of the world’s largest GPU supercomputing clusters to support cutting-edge AI research and large-scale model deployment. We’re looking for an Infrastructure Engineer to join our core platform team to help build, operate, and scale our hybrid infrastructure across both on-prem and cloud environments.

This role is ideal for engineers who thrive at the intersection of distributed systems, cloud automation, and high-performance computing.

Key Responsibilities

  • Operate and scale high-performance GPU clusters used for AI training and production inference.
  • Manage infrastructure across on-premise (Slurm-based) HPC environments and cloud providers like AWS and Azure .
  • Implement and maintain Infrastructure as Code using Pulumi , Terraform , or Ansible .
  • Enhance and secure deployment pipelines using Kubernetes , Flux , and ArgoCD .
  • Help define and enforce security best practices for internal researchers and production services.
  • Continuously improve observability, resiliency, and operational tooling across environments.

Tech Stack

  • Kubernetes, Slurm
  • Pulumi, Terraform, Ansible
  • Rust and Go
  • Flux, ArgoCD
  • AWS, Azure
  • Professional Experience

  • Strong experience managing compute infrastructure in hybrid environments (on-prem and cloud).
  • Hands-on experience operating Slurm clusters at scale.
  • Proficiency in deploying and managing containerized applications, ideally written in Rust or Go .
  • Solid background in IaC and CI / CD best practices.
  • Experience working with GPU workloads or HPC infrastructure is a strong plus.
  • Familiarity with securing and monitoring multi-tenant compute environments.
  • Salary depends on level.

    Visa Sponsorship

    This position is eligible for visa sponsorship.

    Benefits Include

  • Comprehensive medical, dental, and vision benefits
  • Bonus
  • 401K Plan
  • Generous paid time off, sick leave and holidays
  • Paid Parental Leave
  • Employee Assistance Program
  • Life insurance and disability
  • Create a job alert for this search

    Senior Infrastructure Engineer • Sunnyvale, CA, US

    Related jobs
    • Promoted
    Cloud Infrastructure Engineer

    Cloud Infrastructure Engineer

    Pacific FusionSan Leandro, CA, United States
    Full-time
    Pacific Fusion was founded in 2023 with the mission to power the world with abundant, affordable, clean energy.We are rapidly designing and building a pulsed magnetic fusion system to achieve net f...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    CARETSan Francisco, CA, United States
    Full-time
    Nationwide Remote - Remote, CA.Support and document IT infrastructure systems, ensuring stability and performance.Comprehend and manage system architecture involving servers, databases, APIs, load ...Show moreLast updated: 1 hour ago
    • Promoted
    • New!
    Infrastructure Engineer - eero, eero Foundations - Cloud Systems and Infrastructure

    Infrastructure Engineer - eero, eero Foundations - Cloud Systems and Infrastructure

    AmazonSan Francisco, CA, United States
    Full-time
    WiFi has become a critical component to every home worldwide.Amazon Company, is the first product to deliver a whole home WiFi experience using mesh technology to make sure you never have to worry ...Show moreLast updated: 1 hour ago
    • Promoted
    • New!
    Senior Datacenter Network Infrastructure Engineer

    Senior Datacenter Network Infrastructure Engineer

    Internet ArchiveSan Francisco, CA, United States
    Full-time
    We are seeking a Senior Datacenter Network Infrastructure Engineer to help develop the strategic design, implementation and optimization of datacenter and network operations at the Internet Archive...Show moreLast updated: 1 hour ago
    • Promoted
    • New!
    Senior Infrastructure Engineer (External) in San Francisco

    Senior Infrastructure Engineer (External) in San Francisco

    Energy Jobline ZRSan Francisco, CA, United States
    Full-time
    Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub.We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy ...Show moreLast updated: 1 hour ago
    • Promoted
    • New!
    Senior Infrastructure Engineer - Bellevue or San Francisco in San Francisco

    Senior Infrastructure Engineer - Bellevue or San Francisco in San Francisco

    Energy Jobline ZRSan Francisco, CA, United States
    Full-time
    Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub.We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy ...Show moreLast updated: 1 hour ago
    • Promoted
    • New!
    Infrastructure Engineer - (Dublin, CA)

    Infrastructure Engineer - (Dublin, CA)

    Articul8Dublin, CA, United States
    Full-time
    At Articul8 AI, we relentlessly pursue excellence and create exceptional AI products that exceed customer expectations.We are a team of dedicated individuals who take pride in our work and strive f...Show moreLast updated: 1 hour ago
    • Promoted
    • New!
    Global Infrastructure Engineer

    Global Infrastructure Engineer

    METANewark, CA, United States
    Full-time
    The Site Operations team is responsible for the delivery of data center compute and storage at Meta, enabling our family of apps and services to support a growing global community.We are seeking a ...Show moreLast updated: 1 hour ago
    • Promoted
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    PumpSan Francisco, CA, United States
    Full-time
    Cloud spend is a whopping $500 billion / yr, the biggest growing expense category for any tech company - tackling these costs requires continuous effort and time from DevOps teams.Pump is a building ...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    AngelListSan Francisco, CA, United States
    Full-time
    We exist to accelerate innovation.We do this by giving more people the opportunity to participate in the venture economy by building the financial infrastructure that makes it possible for more peo...Show moreLast updated: 1 hour ago
    • Promoted
    Senior Software Engineer : Infrastructure

    Senior Software Engineer : Infrastructure

    DigitalOceanSan Francisco, CA, United States
    Full-time
    Dive in and do the best work of your career at DigitalOcean.Journey alongside a strong community of top talent who are relentless in their drive to build the simplest scalable cloud.If you have a g...Show moreLast updated: 30+ days ago
    • Promoted
    Infrastructure Engineer

    Infrastructure Engineer

    FactorySan Francisco, CA, United States
    Full-time
    Factory is seeking seasoned Infrastructure Engineers to architect, build, and maintain our cloud infrastructure.Lead the design and implementation of robust, secure, and highly scalable cloud infra...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    DigitalOceanSan Francisco, CA, United States
    Full-time
    Dive in and do the best work of your career at DigitalOcean.Journey alongside a strong community of top talent who are relentless in their drive to build the simplest scalable cloud.If you have a g...Show moreLast updated: 1 hour ago
    • Promoted
    Senior Cloud Infrastructure Engineer

    Senior Cloud Infrastructure Engineer

    Omni Analytics, Inc.San Francisco, CA, United States
    Full-time
    Omni gives businesses one place to easily analyze all their data.Built by the teams behind Looker and Stitch, Omni combines data models, a point-and-click UI, spreadsheet formulas, and powerful vis...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Infrastructure Software Engineer, Enterprise AI

    Senior Infrastructure Software Engineer, Enterprise AI

    Scale AI, Inc.San Francisco, CA, United States
    Full-time
    Scale GP is building the next generation of enterprise-grade Generative AI products.Our platform provides APIs for knowledge retrieval, inference, and evaluation, enabling customers to build and de...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Senior Software Engineer, Infrastructure

    Senior Software Engineer, Infrastructure

    AnthropicSan Francisco, CA, United States
    Full-time
    Anthropic's mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...Show moreLast updated: 1 hour ago
    • Promoted
    • New!
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    Recruiting from ScratchSan Francisco, CA, United States
    Full-time
    Who is Recruiting from Scratch : .Recruiting from Scratch is a specialized talent firm dedicated to helping companies build exceptional teams. We partner closely with our clients to deeply understand ...Show moreLast updated: 1 hour ago
    • Promoted
    Senior Traffic Engineer

    Senior Traffic Engineer

    JobotMilpitas, CA, US
    Full-time
    Brand New Civil Project Engineer Opening With Leader In Transportation, Land Development, Utilities and Drainage Design!. This Jobot Job is hosted by : Brian Perkins.Are you a fit? Easy Apply now by ...Show moreLast updated: 30+ days ago