Talent.com
Site Reliability Engineer HPC & Distributed Compute

Site Reliability Engineer HPC & Distributed Compute

Boom SupersonicDenver, Colorado, United States
30+ days ago
Job type
  • Full-time
  • Permanent
Job description

Help Supersonic Software Take Flight

At Boom, we're scaling supersonic innovation. That means pushing petabytes, taming thousands of cores, and building a compute backbone that lets our engineers simulate, analyze, and design the next era of aviation faster than ever. Sound like your kind of puzzle?

As a Site Reliability Engineer, you’ll sit at the intersection of aerospace and infrastructure—building the environments that keep Boom’s engineers moving faster than the speed of sound. From auto-scaling cloud systems to hands-free Linux workstation provisioning, you’ll streamline and safeguard everything behind the scenes. You’ll work shoulder-to-shoulder with engineering users, solving tough problems, and shipping tools that make supersonic development possible.

This isn’t just uptime and metrics—this is aviation-grade reliability. If that sounds exciting, we’re ready for you to dive in.

Role Overview

  • Architect and scale our on-prem and cloud-based HPC infrastructure—supporting GPU, CPU, and hybrid workflows
  • Optimize job scheduling and distributed workload management (e.g., SLURM, AWS Batch, Kubernetes) for massively parallel simulations
  • Engineer storage solutions that balance IOPS, throughput, and cost—across object, block, and parallel file systems
  • Embed with simulation and data teams to understand real bottlenecks—and then eliminate them
  • Level up observability across dozens of internal applications—unifying monitoring, alerting, and diagnostics into a single view
  • Automate everything from Linux workstation provisioning to dependency management and source-control enforcement
  • Own infrastructure reliability across cloud (AWS) and on-prem environments
  • Automate everything : deployments, upgrades, health checks, and recovery processes
  • Collaborate with aerospace engineers and IT partners to eliminate friction and reduce failure modes
  • Champion SRE best practices, mentoring teammates and influencing broader software lifecycle strategy

Ideal Candidate

  • Professional experience in a blend of Linux systems administration and software development
  • Write clean, maintainable code (especially in Python and bash -Go experience is a plus)  in structured, team-oriented development environments with code review and source control
  • Have deployed and monitored distributed systems, such as microservices or client / server architectures
  • Hands-on experience designing and managing petabyte-scale storage systems (Lustre, BeeGFS, Ceph, ZFS)
  • Know how to wrangle fleets of Linux workstations with configuration management and automation tools
  • Familiarity with containerization (Docker, Singularity) and infrastructure-as-code (Terraform, Ansible, CDK)
  • Are comfortable coordinating backups and disaster recovery with IT stakeholders
  • Comfortable navigating fast-paced environments and high-ownership teams
  • Are endlessly curious and hungry to learn—especially about aerospace systems and the people building them
  • What Will Set You Apart

  • Prior experience in aerospace, defense, biotech, or other simulation-intensive industries, supported by large-scale, auto-scaling infrastructure
  • Familiarity with EDA, CAE, or CFD pipelines and their unique compute / storage needs
  • You’ve debugged distributed or threaded code, like goroutines or similar
  • You’ve built notification tooling that integrates with Slack, SMS, or email
  • You’ve hosted and secured modern SPAs and APIs in production environments
  • You’ve improved performance with distributed caching and content delivery strategies
  • Fearless curiosity—you chase down obscure kernel tuning flags and understand what they do
  • History of mentoring others in system reliability, automation, or performance optimization
  • Compensation

    The Base Salary Range for this position is $140,000 - $177,000 per year. Actual salaries will vary based on factors including but not limited to location, experience, and performance. The range listed is just one component of Boom’s total rewards package for employees. Other rewards may include long term incentives / equity, a flexible PTO policy, and many other progressive benefits.

    There is no set deadline to apply for this job opportunity. Applications will be accepted on an ongoing basis until the search is no longer active.

    ITAR Requirement

    To conform to U.S. Government aerospace technology export regulations (ITAR and EAR), applicant must be a U.S. citizen, lawful permanent resident of the U.S., protected individual as defined by 8 U.S.C 1324b(a)(3), or eligible to obtain the required authorizations from the U.S. Department of State .  Learn more about ITAR here.

    Boom is an equal opportunity employer and we value diversity. All employment is decided on the basis of qualifications, merit and business need.

    Want to build a faster future? Come join Boom.

    Create a job alert for this search

    Site Reliability Engineer • Denver, Colorado, United States

    Related jobs
    Site Reliability Engineer (Must Possess Top Secret Clearance)

    Site Reliability Engineer (Must Possess Top Secret Clearance)

    Northwest Talent Solutions LLCAurora, Colorado, United States
    Full-time
    Quick Apply
    Site Reliability Engineer Space & Intelligence Systems.Aurora, CO (Onsite 23 Days / Week at Buckley Space Force Base).Relocation Assistance Available | 9 / 80 Schedule | Industry-Leading Benefits.N...Show moreLast updated: 17 days ago
    • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    VisaHighlands Ranch, CO, United States
    Full-time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show moreLast updated: 26 days ago
    • Promoted
    • New!
    Senior Distinguished Engineer, SDUI — Remote Leader

    Senior Distinguished Engineer, SDUI — Remote Leader

    Capital OneDenver, CO, United States
    Remote
    Full-time
    A leading financial services company is seeking a Sr.Distinguished Engineer to define and drive the technical strategy for Server-Driven UI Frameworks. You will lead a talented team, mentor others, ...Show moreLast updated: 12 hours ago
    • Promoted
    • New!
    Senior. Platform Engineer - Linux (Onsite)

    Senior. Platform Engineer - Linux (Onsite)

    RaytheonAurora, Colorado, United States
    Full-time
    Country : United States of America.Location : CO102 : 16800 E Centretech Pkwy,Aurora 16800 East Centretech Pkwy Building S75, Aurora, CO, 80011 USA. Person, or Immigration Status Requirements : Active a...Show moreLast updated: 19 hours ago
    Site Reliability Engineer

    Site Reliability Engineer

    Northwest Talent Solutions LLCAurora, Colorado, United States
    Full-time
    Quick Apply
    Site Reliability Engineer Space & Intelligence Systems.Aurora, CO (Onsite 23 Days / Week at Buckley Space Force Base).Relocation Assistance Available | 9 / 80 Schedule | Industry-Leading Benefits.N...Show moreLast updated: 17 days ago
    Systems Engineer

    Systems Engineer

    EnduroSatDenver, CO, US
    Permanent
    Quick Apply
    A fast-growing space scale-up at the forefront of satellite innovation, specializing in advanced software-flexible satellites for commercial, governmental, and scientific missions.Our goal is to ma...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer - Infrastructure

    Senior Site Reliability Engineer - Infrastructure

    The Trade DeskBoulder, CO, United States
    Full-time
    The Trade Desk is changing the way global brands and their agencies advertise to audiences around the world.How? With a media buying platform that helps brands deliver a more insightful and relevan...Show moreLast updated: 30+ days ago
    • Promoted
    Mid-Level Systems Engineer

    Mid-Level Systems Engineer

    Leidos IncAurora, CO, United States
    Full-time
    Leidos National Security Sector combines technology-enabled services and mission software capabilities in the areas of cyber, logistics, security operations, and decision analytics to support our d...Show moreLast updated: 30+ days ago
    • Promoted
    Sr. Satellite Lead Systems Engineer

    Sr. Satellite Lead Systems Engineer

    Blue OriginDenver, CO, United States
    Permanent
    Applications will be accepted on an ongoing basis until the requisition is closed.At Blue Origin, we envision millions of people living and working in space for the benefit of Earth.We're working t...Show moreLast updated: 16 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    AppOmniDenver, CO, United States
    Full-time
    AppOmni, a leader in SaaS Security, helps customers achieve secure productivity with their applications.Security teams and owners can quickly detect and mitigate threats using unmatched depth of pr...Show moreLast updated: 1 day ago
    • Promoted
    Senior Systems Engineer

    Senior Systems Engineer

    Leidos IncAurora, CO, United States
    Full-time
    Leidos National Security Sector combines technology-enabled services and mission software capabilities in the areas of cyber, logistics, security operations, and decision analytics to support our d...Show moreLast updated: 30+ days ago
    Lead IT & Systems Engineer

    Lead IT & Systems Engineer

    Charm IndustrialFort Lupton, Colorado, United States, 80621
    Permanent
    Our mission is to return the atmosphere to 280 ppm CO.We convert excess inedible biomass into carbon-rich bio-oil and inject it into underground storage for permanent carbon removal.At scale, we ca...Show moreLast updated: 24 days ago
    • Promoted
    Systems Engineer - Onsite

    Systems Engineer - Onsite

    RaytheonAurora, Colorado, United States
    Full-time
    Country : United States of America.Location : CO102 : 16800 E Centretech Pkwy,Aurora 16800 East Centretech Pkwy Building S75, Aurora, CO, 80011 USA. Person, or Immigration Status Requirements : Active a...Show moreLast updated: 30+ days ago
    • Promoted
    Engineer - System Performance III (5G / LTE Optimization)

    Engineer - System Performance III (5G / LTE Optimization)

    NextGen | GTA : A Kelly Telecom CompanyDenver, CO, United States
    Full-time
    Join our network performance team focusing on macro network optimization involving outdoor macro sites, small cells, and cutting-edge 5G technologies (C-band and millimeter wave).This role involves...Show moreLast updated: 23 hours ago
    • Promoted
    Amazon Dedicated Cloud Engineer, Region Reliability Engineering & Automation

    Amazon Dedicated Cloud Engineer, Region Reliability Engineering & Automation

    AmazonDenver, CO, US
    Full-time
    Join Our Region Reliability Engineering & Automation Team.Are you passionate about creating resilient cloud systems that power mission-critical operations? Do you want to apply leading-edge artific...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Systems Engineer

    Systems Engineer

    Robert HalfDenver, CO, US
    Full-time
    We are looking for a skilled Systems Engineer to join our team in the Denver Metro Area.The ideal candidate will bring expertise in managing and optimizing IT systems, ensuring seamless functionali...Show moreLast updated: 20 hours ago
    • Promoted
    • New!
    Reliability Engineer

    Reliability Engineer

    Golden AluminumFort Lupton, CO, US
    Full-time
    We are a continuous casting aluminum rolling mill who established operations in 1984 and located in Fort Lupton Colorado, 35 miles north of downtown Denver. We care about our environment, our people...Show moreLast updated: 6 hours ago
    • Promoted
    Sr. Site Reliability Engineer

    Sr. Site Reliability Engineer

    VisaHighlands Ranch, CO, United States
    Full-time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show moreLast updated: 30+ days ago