A company is looking for a Site Reliability Engineer to ensure the reliability and performance of its multi-tenant SaaS platforms.
Key Responsibilities
Define and implement SLIs, SLOs, and error budgets for critical services
Design systems for reliability, scalability, and fault tolerance
Conduct capacity planning and lead incident response during production incidents
Required Qualifications
3+ years of experience as an SRE, DevOps Engineer, or Production Engineer
Proven experience with highly available, enterprise-grade, multi-tenant SaaS platforms
Hands-on experience with observability and monitoring tools
Solid understanding of Linux, networking, and distributed systems
Experience with containerized environments such as Docker and Kubernetes
Site Reliability Engineer • Alexandria, Virginia, United States