A company is looking for a Staff Reliability Engineer to shape the foundation of Site Reliability Engineering.
Key Responsibilities
Define and evolve the reliability strategy, establishing SLOs, SLIs, and error budgets
Architect and scale resilient systems, partnering with engineering teams to design and operate distributed systems
Drive observability and operational excellence by enhancing metrics, logging, and tracing for actionable insights
Required Qualifications
8-10+ years of experience in systems, infrastructure, or backend engineering
Proven experience in defining and delivering reliability outcomes through SLOs and observability practices
Strong background in infrastructure-as-code, scripting, and automation
Experienced in incident management and driving postmortems for reliability improvements
Demonstrated ability to mentor engineers and influence cross-functional teams
Site Reliability Engineer • Provo, Utah, United States