Software Engineer, Site Reliability (SRE)
Software Engineer, Site Reliability (SRE) at Sierra Business Solution .
About Us
- We are an in‑person company based in San Francisco with growing offices in Atlanta, New York, and London, building a platform that helps businesses create better, more human customer experiences with AI.
- Our core values are Trust, Customer Obsession, Craftsmanship, Intensity, and Family.
- Company founders : Bret Taylor, former Salesforce and Facebook executive; Clay Bavor, former Google Labs leader.
What You’ll Do
Own Sierra’s observability stack—monitoring, alerting, logging, and tracing—to give engineers clear visibility into system health and performance.Partner with product and platform engineers to design reliable, scalable systems from day one.Design and implement scalable, secure cloud infrastructure (AWS) using Terraform and modern DevOps tooling.Improve reliability and scalability of LLM deployments, ensuring robust, cost‑effective operation.Lead improvements to deployment pipelines, CI / CD tooling, and incident‑management processes.Define the foundation of SRE practices at Sierra, influencing culture, tooling, and best practices.What You’ll Bring
5+ years of hands‑on experience in Site Reliability or infrastructure engineering for complex SaaS or cloud‑based systems.Experience designing for availability, scalability, and reliability at both infrastructure and application layers.Deep experience with Terraform, AWS services, container orchestration, and cloud networking (IAM, VPC).Strong background in observability systems (Prometheus, Grafana, Datadog, or similar).Experience working with enterprise customers and familiarity with compliance and networking needs.Comfortable working in fast‑moving environments and collaborating across teams.Degree in Computer Science or equivalent professional experience.Even Better
Experience with LLM infrastructure—optimizing inference, managing fine‑tuned models, or large‑scale deployment.Early‑stage startup experience defining SRE culture and tooling from scratch.Familiarity with incident‑management automation or self‑healing infrastructure patterns.Benefits
Unlimited Paid Time OffMedical, Dental, and Vision benefitsLife Insurance and Disability Benefits401(k) retirement plan with company matchParental Leave and fertility benefits via CarrotLunch, snacks, coffee, and discretionary stipendEquity plans per applicable policiesEquality & Diversity
We actively encourage applicants of all backgrounds to apply. We strive to evaluate all applicants consistently without regard to race, color, religion, gender, sexual orientation, age, disability, veteran status, or any other protected characteristic.
#J-18808-Ljbffr