Talent.com
Site Reliability Engineer HPC & Distributed Compute
Site Reliability Engineer HPC & Distributed ComputeBoom Supersonic • Denver, Colorado, United States
Site Reliability Engineer HPC & Distributed Compute

Site Reliability Engineer HPC & Distributed Compute

Boom Supersonic • Denver, Colorado, United States
30+ days ago
Job type
  • Full-time
  • Permanent
Job description

Help Supersonic Software Take Flight

At Boom, we're scaling supersonic innovation. That means pushing petabytes, taming thousands of cores, and building a compute backbone that lets our engineers simulate, analyze, and design the next era of aviation faster than ever. Sound like your kind of puzzle?

As a Site Reliability Engineer, you’ll sit at the intersection of aerospace and infrastructure—building the environments that keep Boom’s engineers moving faster than the speed of sound. From auto-scaling cloud systems to hands-free Linux workstation provisioning, you’ll streamline and safeguard everything behind the scenes. You’ll work shoulder-to-shoulder with engineering users, solving tough problems, and shipping tools that make supersonic development possible.

This isn’t just uptime and metrics—this is aviation-grade reliability. If that sounds exciting, we’re ready for you to dive in.

Role Overview

  • Architect and scale our on-prem and cloud-based HPC infrastructure—supporting GPU, CPU, and hybrid workflows
  • Optimize job scheduling and distributed workload management (e.g., SLURM, AWS Batch, Kubernetes) for massively parallel simulations
  • Engineer storage solutions that balance IOPS, throughput, and cost—across object, block, and parallel file systems
  • Embed with simulation and data teams to understand real bottlenecks—and then eliminate them
  • Level up observability across dozens of internal applications—unifying monitoring, alerting, and diagnostics into a single view
  • Automate everything from Linux workstation provisioning to dependency management and source-control enforcement
  • Own infrastructure reliability across cloud (AWS) and on-prem environments
  • Automate everything : deployments, upgrades, health checks, and recovery processes
  • Collaborate with aerospace engineers and IT partners to eliminate friction and reduce failure modes
  • Champion SRE best practices, mentoring teammates and influencing broader software lifecycle strategy

Ideal Candidate

  • Professional experience in a blend of Linux systems administration and software development
  • Write clean, maintainable code (especially in Python and bash -Go experience is a plus)  in structured, team-oriented development environments with code review and source control
  • Have deployed and monitored distributed systems, such as microservices or client / server architectures
  • Hands-on experience designing and managing petabyte-scale storage systems (Lustre, BeeGFS, Ceph, ZFS)
  • Know how to wrangle fleets of Linux workstations with configuration management and automation tools
  • Familiarity with containerization (Docker, Singularity) and infrastructure-as-code (Terraform, Ansible, CDK)
  • Are comfortable coordinating backups and disaster recovery with IT stakeholders
  • Comfortable navigating fast-paced environments and high-ownership teams
  • Are endlessly curious and hungry to learn—especially about aerospace systems and the people building them
  • What Will Set You Apart

  • Prior experience in aerospace, defense, biotech, or other simulation-intensive industries, supported by large-scale, auto-scaling infrastructure
  • Familiarity with EDA, CAE, or CFD pipelines and their unique compute / storage needs
  • You’ve debugged distributed or threaded code, like goroutines or similar
  • You’ve built notification tooling that integrates with Slack, SMS, or email
  • You’ve hosted and secured modern SPAs and APIs in production environments
  • You’ve improved performance with distributed caching and content delivery strategies
  • Fearless curiosity—you chase down obscure kernel tuning flags and understand what they do
  • History of mentoring others in system reliability, automation, or performance optimization
  • Compensation

    The Base Salary Range for this position is $140,000 - $177,000 per year. Actual salaries will vary based on factors including but not limited to location, experience, and performance. The range listed is just one component of Boom’s total rewards package for employees. Other rewards may include long term incentives / equity, a flexible PTO policy, and many other progressive benefits.

    There is no set deadline to apply for this job opportunity. Applications will be accepted on an ongoing basis until the search is no longer active.

    ITAR Requirement

    To conform to U.S. Government aerospace technology export regulations (ITAR and EAR), applicant must be a U.S. citizen, lawful permanent resident of the U.S., protected individual as defined by 8 U.S.C 1324b(a)(3), or eligible to obtain the required authorizations from the U.S. Department of State .  Learn more about ITAR here.

    Boom is an equal opportunity employer and we value diversity. All employment is decided on the basis of qualifications, merit and business need.

    Want to build a faster future? Come join Boom.

    Create a job alert for this search

    Site Reliability Engineer • Denver, Colorado, United States

    Related jobs
    Site Reliability Engineer

    Site Reliability Engineer

    Leidos • Denver, CO, US
    Full-time
    The Multi Domain Solutions Division at Leidos is looking for a.This role involves supporting the delivery of comprehensive IT and support services to ensure mission success while adhering to DoD st...Show more
    Last updated: 23 days ago • Promoted
    Deployment Engineer

    Deployment Engineer

    IT Labs • Denver, CO, US
    Remote
    Full-time +1
    Quick Apply
    At IT Labs, we’re looking for a hands-on .This company is building an AI-powered simulation software stack that enables faster, smarter innovation across industries like Aerospace, Automotive,...Show more
    Last updated: 8 days ago
    Flexible Remote Work – Get Paid to Share Your Opinions on Top Brands

    Flexible Remote Work – Get Paid to Share Your Opinions on Top Brands

    OCPA • Castle Pines, Colorado, us
    Remote
    Part-time +1
    Product Testers are wanted to work from home nationwide in the US to fulfill upcoming contracts with national and international companies. We guarantee 15-25 hours per week with an hourly pay of bet...Show more
    Last updated: 30+ days ago • Promoted
    Senior Endpoint Cybersecurity Engineer

    Senior Endpoint Cybersecurity Engineer

    Douglas County Sheriff's Office • Castle Rock, CO, United States
    Full-time
    Senior Endpoint Cybersecurity Engineer.The Endpoint Cybersecurity Engineer is part of a team that performs three core functions for the County. The first is the day-to-day operations of the in-place...Show more
    Last updated: 3 days ago • Promoted
    Sr Embedded Systems Engineer

    Sr Embedded Systems Engineer

    Infleqtion • Louisville, CO, US
    Full-time
    Quick Apply
    The company harnesses quantum mechanics to build and integrate quantum computers, sensors, and networks.From fundamental physics to leading-edge commercial products, Infleqtion enables “quantum eve...Show more
    Last updated: 30+ days ago
    Fluid Systems Design Engineer II - Large Scale Tests [Fixed Term]

    Fluid Systems Design Engineer II - Large Scale Tests [Fixed Term]

    Blue Origin • Denver, CO, United States
    Permanent +1
    Applications will be accepted on an ongoing basis until the requisition is closed.At Blue Origin, we envision millions of people living and working in space for the benefit of Earth.We're working t...Show more
    Last updated: 15 days ago • Promoted
    Test Products from Home – $25-$45 / hr + Freebies

    Test Products from Home – $25-$45 / hr + Freebies

    OCPA • Roxborough Park, Colorado, us
    Part-time +1
    Product Testers are wanted to work from home nationwide in the US to fulfill upcoming contracts with national and international companies. We guarantee 15-25 hours per week with an hourly pay of bet...Show more
    Last updated: 30+ days ago • Promoted
    Information Systems Technician

    Information Systems Technician

    U.S. Navy • Castle Pines, CO, United States
    Full-time
    ABOUT Effective, secure communication in the cyber domain is essential to the everyday operations of military intelligence in America’s Navy. Information Professionals who oversee the seamless opera...Show more
    Last updated: 6 days ago • Promoted
    Systems Engineer

    Systems Engineer

    EnduroSat • Denver, CO, US
    Permanent
    Quick Apply
    A fast-growing space scale-up at the forefront of satellite innovation, specializing in advanced software-flexible satellites for commercial, governmental, and scientific missions.Our goal is to ma...Show more
    Last updated: 30+ days ago
    Spacecraft Systems Engineer

    Spacecraft Systems Engineer

    EOI Space • Louisville, CO, US
    Full-time
    Quick Apply
    EOI Space specializes in developing and deploying a network of satellites in Very Low Earth Orbit (VLEO) to provide ultra-high-resolution Earth imagery. We aim to deliver timely and actionable data ...Show more
    Last updated: 30+ days ago
    Senior Site Reliability Engineer - Infrastructure

    Senior Site Reliability Engineer - Infrastructure

    The Trade Desk • Denver, CO, United States
    Full-time
    The Trade Desk is changing the way global brands and their agencies advertise to audiences around the world.How? With a media buying platform that helps brands deliver a more insightful and relevan...Show more
    Last updated: 15 days ago • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Visa • Highlands Ranch, CO, United States
    Full-time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show more
    Last updated: 30+ days ago • Promoted
    Data Center Chief Engineer

    Data Center Chief Engineer

    JLL • Broomfield, CO, US
    Full-time
    Our people at JLL and JLL Technologies are shaping the future of real estate for a better world by combining world class services, advisory and technology for our clients.We are committed to hiring...Show more
    Last updated: 30+ days ago • Promoted
    IT Professional

    IT Professional

    U.S. Navy • Castle Rock, CO, United States
    Full-time
    ABOUT Effective, secure communication in the cyber domain is essential to the everyday operations of military intelligence in America’s Navy. Information Professionals who oversee the seamless opera...Show more
    Last updated: 23 days ago • Promoted
    Senior Ground Systems Engineer

    Senior Ground Systems Engineer

    EOI Space • Louisville, CO, US
    Full-time
    Quick Apply
    EOI Space is developing and deploying a network of satellites in Very Low Earth Orbit (VLEO) to provide ultra-high-resolution Earth imagery. We aim to deliver timely and actionable data for commerci...Show more
    Last updated: 30+ days ago
    Senior Endpoint Cybersecurity Engineer

    Senior Endpoint Cybersecurity Engineer

    Douglas County • Castle Rock, CO, United States
    Full-time +1
    The Endpoint Cybersecurity Engineer is part of a team that performs three core functions for the County.The first is the day-to-day operations of the in-place security solutions.The second is the i...Show more
    Last updated: 3 days ago • Promoted
    Sr Data Center Engineer

    Sr Data Center Engineer

    Globalchannelmanagement • Broomfield, Colorado, United States
    Full-time
    Quick Apply
    Sr Data Center Engineer needs 5 years experience performing maintenance and installation of mission critical infrastructure systems in a Data Center. Sr Data Center Engineer requires : .Good collabora...Show more
    Last updated: 9 days ago
    Kubernetes Engineer

    Kubernetes Engineer

    Leidos • Denver, CO, US
    Full-time
    The Leidos Digital Modernization Sector has an exciting career opportunity for an experienced.Colorado Springs, CO to provide onsite support for the US Space Force’s Space Systems Command (SS...Show more
    Last updated: 8 days ago • Promoted