Talent.com
Site Reliability Engineer HPC & Distributed Compute
Site Reliability Engineer HPC & Distributed ComputeBoom Supersonic • Denver, Colorado, United States
Site Reliability Engineer HPC & Distributed Compute

Site Reliability Engineer HPC & Distributed Compute

Boom Supersonic • Denver, Colorado, United States
30+ days ago
Job type
  • Full-time
  • Permanent
Job description

Help Supersonic Software Take Flight

At Boom, we're scaling supersonic innovation. That means pushing petabytes, taming thousands of cores, and building a compute backbone that lets our engineers simulate, analyze, and design the next era of aviation faster than ever. Sound like your kind of puzzle?

As a Site Reliability Engineer, you’ll sit at the intersection of aerospace and infrastructure—building the environments that keep Boom’s engineers moving faster than the speed of sound. From auto-scaling cloud systems to hands-free Linux workstation provisioning, you’ll streamline and safeguard everything behind the scenes. You’ll work shoulder-to-shoulder with engineering users, solving tough problems, and shipping tools that make supersonic development possible.

This isn’t just uptime and metrics—this is aviation-grade reliability. If that sounds exciting, we’re ready for you to dive in.

Role Overview

  • Architect and scale our on-prem and cloud-based HPC infrastructure—supporting GPU, CPU, and hybrid workflows
  • Optimize job scheduling and distributed workload management (e.g., SLURM, AWS Batch, Kubernetes) for massively parallel simulations
  • Engineer storage solutions that balance IOPS, throughput, and cost—across object, block, and parallel file systems
  • Embed with simulation and data teams to understand real bottlenecks—and then eliminate them
  • Level up observability across dozens of internal applications—unifying monitoring, alerting, and diagnostics into a single view
  • Automate everything from Linux workstation provisioning to dependency management and source-control enforcement
  • Own infrastructure reliability across cloud (AWS) and on-prem environments
  • Automate everything : deployments, upgrades, health checks, and recovery processes
  • Collaborate with aerospace engineers and IT partners to eliminate friction and reduce failure modes
  • Champion SRE best practices, mentoring teammates and influencing broader software lifecycle strategy

Ideal Candidate

  • Professional experience in a blend of Linux systems administration and software development
  • Write clean, maintainable code (especially in Python and bash -Go experience is a plus)  in structured, team-oriented development environments with code review and source control
  • Have deployed and monitored distributed systems, such as microservices or client / server architectures
  • Hands-on experience designing and managing petabyte-scale storage systems (Lustre, BeeGFS, Ceph, ZFS)
  • Know how to wrangle fleets of Linux workstations with configuration management and automation tools
  • Familiarity with containerization (Docker, Singularity) and infrastructure-as-code (Terraform, Ansible, CDK)
  • Are comfortable coordinating backups and disaster recovery with IT stakeholders
  • Comfortable navigating fast-paced environments and high-ownership teams
  • Are endlessly curious and hungry to learn—especially about aerospace systems and the people building them
  • What Will Set You Apart

  • Prior experience in aerospace, defense, biotech, or other simulation-intensive industries, supported by large-scale, auto-scaling infrastructure
  • Familiarity with EDA, CAE, or CFD pipelines and their unique compute / storage needs
  • You’ve debugged distributed or threaded code, like goroutines or similar
  • You’ve built notification tooling that integrates with Slack, SMS, or email
  • You’ve hosted and secured modern SPAs and APIs in production environments
  • You’ve improved performance with distributed caching and content delivery strategies
  • Fearless curiosity—you chase down obscure kernel tuning flags and understand what they do
  • History of mentoring others in system reliability, automation, or performance optimization
  • Compensation

    The Base Salary Range for this position is $140,000 - $177,000 per year. Actual salaries will vary based on factors including but not limited to location, experience, and performance. The range listed is just one component of Boom’s total rewards package for employees. Other rewards may include long term incentives / equity, a flexible PTO policy, and many other progressive benefits.

    There is no set deadline to apply for this job opportunity. Applications will be accepted on an ongoing basis until the search is no longer active.

    ITAR Requirement

    To conform to U.S. Government aerospace technology export regulations (ITAR and EAR), applicant must be a U.S. citizen, lawful permanent resident of the U.S., protected individual as defined by 8 U.S.C 1324b(a)(3), or eligible to obtain the required authorizations from the U.S. Department of State .  Learn more about ITAR here.

    Boom is an equal opportunity employer and we value diversity. All employment is decided on the basis of qualifications, merit and business need.

    Want to build a faster future? Come join Boom.

    Create a job alert for this search

    Site Reliability Engineer • Denver, Colorado, United States

    Related jobs
    Site Reliability Engineer

    Site Reliability Engineer

    Leidos • Denver, CO, US
    Full-time
    The Multi Domain Solutions Division at Leidos is looking for a.This role involves supporting the delivery of comprehensive IT and support services to ensure mission success while adhering to DoD st...Show more
    Last updated: 23 days ago • Promoted
    Deployment Engineer

    Deployment Engineer

    IT Labs • Denver, CO, US
    Remote
    Full-time +1
    Quick Apply
    At IT Labs, we’re looking for a hands-on .This company is building an AI-powered simulation software stack that enables faster, smarter innovation across industries like Aerospace, Automotive,...Show more
    Last updated: 8 days ago
    Flexible Remote Work – Get Paid to Share Your Opinions on Top Brands

    Flexible Remote Work – Get Paid to Share Your Opinions on Top Brands

    OCPA • Castle Rock, Colorado, us
    Remote
    Part-time +1
    Product Testers are wanted to work from home nationwide in the US to fulfill upcoming contracts with national and international companies. We guarantee 15-25 hours per week with an hourly pay of bet...Show more
    Last updated: 30+ days ago • Promoted
    Senior Endpoint Cybersecurity Engineer

    Senior Endpoint Cybersecurity Engineer

    Douglas County Sheriff's Office • Castle Rock, CO, United States
    Full-time
    Senior Endpoint Cybersecurity Engineer.The Endpoint Cybersecurity Engineer is part of a team that performs three core functions for the County. The first is the day-to-day operations of the in-place...Show more
    Last updated: 3 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Checkr • Denver, Colorado, United States
    Full-time
    Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Metarouter • Denver, Colorado, United States
    Remote
    Full-time
    Senior Site Reliability Engineer.MetaRouter provides highly reliable and robust Customer Data Infrastructure via Software-as-a-Service and Self-Hosted deployment options. Our platform allows organiz...Show more
    Last updated: 30+ days ago • Promoted
    Design Engineer II

    Design Engineer II

    Air Squared Manufacturing Inc • Thornton, CO, US
    Full-time
    Title of Position : Design Engineer II .Position Type : Full time; Exempt.We specialize in challenging projects that demand creative, custom-engineered solutions.By working closely with o...Show more
    Last updated: 30+ days ago • Promoted
    Claim Specialist - Property Field Inspection

    Claim Specialist - Property Field Inspection

    State Farm • Castle Rock, CO, United States
    Full-time
    Being good neighbors - helping people, investing in our communities, and making the world a better place - is who we are at State Farm. It is at the core of how we operate and the reason for our suc...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Kharon • Denver, Colorado, United States
    Full-time
    Kharon is seeking a full-time Staff Site Reliability Engineer based in Denver, Colorado.In office attendance is expected in this role. Stand up and standardize metrics, logging, tracing, and alert h...Show more
    Last updated: 28 days ago • Promoted
    Software Engineer (multiple openings)

    Software Engineer (multiple openings)

    Numerica Corporation • Broomfield, Colorado, United States
    Full-time
    Numerica's Software Engineers excel at developing state-of-the art algorithms and software that solve scientific problems with real-world applications. Working in small innovative teams, our softwar...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer

    Software Engineer

    Trimble • Westminster, Colorado, United States
    Full-time
    We are seeking a talented and passionate.In this role, you will be responsible for troubleshooting, fixing, and enhancing high-quality software applications. You will work closely with other enginee...Show more
    Last updated: 30+ days ago • Promoted
    Fluid Systems Design Engineer II - Large Scale Tests [Fixed Term]

    Fluid Systems Design Engineer II - Large Scale Tests [Fixed Term]

    Blue Origin • Denver, CO, United States
    Permanent +1
    Applications will be accepted on an ongoing basis until the requisition is closed.At Blue Origin, we envision millions of people living and working in space for the benefit of Earth.We're working t...Show more
    Last updated: 9 days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    AppOmni • Denver, CO, United States
    Full-time
    AppOmni, a leader in SaaS Security, helps customers achieve secure productivity with their applications.Security teams and owners can quickly detect and mitigate threats using unmatched depth of pr...Show more
    Last updated: 8 days ago • Promoted
    Senior Site Reliability Engineer - Infrastructure

    Senior Site Reliability Engineer - Infrastructure

    The Trade Desk • Denver, CO, United States
    Full-time
    The Trade Desk is changing the way global brands and their agencies advertise to audiences around the world.How? With a media buying platform that helps brands deliver a more insightful and relevan...Show more
    Last updated: 15 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Visa • Highlands Ranch, Colorado, United States
    Full-time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show more
    Last updated: 30+ days ago • Promoted
    Senior Endpoint Cybersecurity Engineer

    Senior Endpoint Cybersecurity Engineer

    Douglas County • Castle Rock, CO, United States
    Full-time +1
    The Endpoint Cybersecurity Engineer is part of a team that performs three core functions for the County.The first is the day-to-day operations of the in-place security solutions.The second is the i...Show more
    Last updated: 3 days ago • Promoted
    Engineer - System Performance III (5G / LTE Optimization)

    Engineer - System Performance III (5G / LTE Optimization)

    NextGen | GTA : A Kelly Telecom Company • Denver, CO, United States
    Full-time
    Join our network performance team focusing on macro network optimization involving outdoor macro sites, small cells, and cutting-edge 5G technologies (C-band and millimeter wave).This role involves...Show more
    Last updated: 1 day ago • Promoted
    Software Development Engineer

    Software Development Engineer

    Amazon • Castle Rock, CO, USA
    Full-time
    Join Amazon's engineering team and help us build innovative solutions to complex problems.As a Software Development Engineer, you will design, develop, and test software applications and services.W...Show more
    Last updated: 22 days ago • Promoted