Talent.com
Founding Site Reliability Engineer
Founding Site Reliability EngineerRelevance AI • San Francisco, CA, United States
Founding Site Reliability Engineer

Founding Site Reliability Engineer

Relevance AI • San Francisco, CA, United States
30+ days ago
Job type
  • Full-time
Job description

Location 📍 : San Francisco, USA (Hybrid 3 days / week)

About Us 🚀

At Relevance AI, our mission is to empower anyone to delegate work to the AI workforce. We’re building a new category of AI automation, enabling teams to create and deploy intelligent AI agents that replicate human-quality work, decision-making, and collaboration at scale.

We’re scaling fast backed by top global investors including Bessemer Venture Partners, Insight Partners, Peak XV, and King River Capital and our platform is already trusted by industry leaders like Canva, Databricks, Confluent, KMPG, Autodesk, and more. With offices in Sydney 🇦🇺 and San Francisco 🇺🇸 (and a new hub launching in Barcelona 🇪🇸), this is your chance to shape the future of work on a global stage.

The Role 🧠

We’re looking for a Founding Site Reliability Engineer to join us as our first SRE hire in San Francisco. We are open to hiring someone who is Senior, Lead or Principal level and will be candidate led. This role is perfect for someone ready to establish and scale the SRE discipline from the ground up in one of the fastest-growing AI companies globally.

You’ll own the reliability, scalability, and security of our platform as we power tens of thousands of multi-agent workloads across multiple regions. You’ll partner closely with our founders, engineering leads, and product teams to define our reliability culture, shape long-term strategy, and build world-class infrastructure for enterprise scale.

What You’ll Do 💪

Own SRE establishing best practices, tooling, and culture

Tackle reliability challenges unique to multi-agent orchestration at enterprise scale

Guarantee >

99.9% uptime of production systems, ensuring reliability at global scale

Architect and automate AWS infrastructure with Terraform and CI / CD pipelines

Design observability systems across microservices, APIs, and vector infrastructure (metrics, tracing, logging)

Drive down incidents and MTTR through runbooks, alerting, and incident response excellence

Help scale infra to support hundreds of thousands of agents and billions of API calls

Partner with engineering teams to embed SRE principles into the SDLC and shape org-wide reliability strategy

Act as a founding voice in our SF office, influencing product direction and engineering culture

What We’re Looking For 🧠

5+ years in SRE / DevOps / Infrastructure roles, with experience in enterprise SaaS environments.

Deep AWS expertise (EC2, ECS / EKS, Lambda, RDS, VPC, IAM).

Proven track record with Infrastructure as Code (Terraform, Kubernetes / EKS, CDK, or CloudFormation).

Hands-on with observability stacks (CloudWatch, Grafana, Prometheus, Datadog).

Incident management experience in production SaaS systems, including on-call, postmortems, and reliability improvements.

Bonus : Prior exposure to AI / ML platforms, data-heavy systems, or multi-agent workloads.

Tech Stack 🧰

AWS, Kubernetes / EKS, Terraform, GitHub Actions, Postgres / Mongo, Prometheus / Grafana, CloudWatch, PagerDuty / BetterStack

#J-18808-Ljbffr

Create a job alert for this search

Founding Site Reliability Engineer • San Francisco, CA, United States

Similar jobs
Lead Site Reliability Engineer

Lead Site Reliability Engineer

Stuut • San Francisco, CA, US
Full-time
Stuut is transforming accounts receivable for B2B companies—making collections smarter and faster for companies that have historically relied on manual processes that are labor intensive and ...Show more
Last updated: 7 days ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Gridware • San Francisco, CA, US
Full-time
Gridware is a San Francisco-based technology company dedicated to protecting and enhancing the electrical grid.We pioneered a groundbreaking new class of grid management called active grid response...Show more
Last updated: 30+ days ago • Promoted
Senior Technology Site Reliability Engineer

Senior Technology Site Reliability Engineer

Cooley LLP • San Francisco, CA, United States
Full-time
Senior Technology Site Reliability Engineer.Cooley is seeking a Senior Site Reliability Engineer to join the.Infrastructure & Development Operations. The Senior Technology Site Reliability Engineer(...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer San Francisco; Hybrid

Site Reliability Engineer San Francisco; Hybrid

Superhuman Labs, Inc. • San Francisco, California, United States
Full-time
Superhuman offers a dynamic hybrid working model for this role.This flexible approach gives team members the best of both worlds : plenty of focus time along with in‑person collaboration that helps ...Show more
Last updated: 14 days ago • Promoted
Site Reliability Engineering

Site Reliability Engineering

Forhyre • San Francisco, CA, US
Full-time
Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas of development and are interested in continuing to improve our platform through the ever-changin...Show more
Last updated: 30+ days ago • Promoted
Founding Site Reliability Engineer

Founding Site Reliability Engineer

Assort Health • San Francisco, California, United States
Full-time
Founding Site Reliability Engineer.This range is provided by Assort Health.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Our mission is to mak...Show more
Last updated: 14 days ago • Promoted
Site Reliability Engineer - Platform

Site Reliability Engineer - Platform

CodeRabbit • San Francisco, CA, United States
Full-time
CodeRabbit is an innovative research and development company focused on building extraordinarily productive human‑machine collaboration systems. Our primary goal is to create the next generation of ...Show more
Last updated: 29 days ago • Promoted
Senior Site Reliability Engineer - Platform

Senior Site Reliability Engineer - Platform

Quizlet • San Francisco, CA, US
Full-time
At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, in...Show more
Last updated: 30+ days ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Zipline • South San Francisco, CA, US
Full-time
Do you want to change the world? Zipline is on a mission to transform the way goods move.Our aim is to solve the world's most urgent and complex access challenges by building, manufacturing and...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Zoox • Foster City, CA, US
Full-time
Zoox is seeking a Site Reliability Engineer to help ensure the availability, performance, and resilience of the services that power the development and operation of our autonomous vehicles.In this ...Show more
Last updated: 23 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

gamma.app • San Francisco, CA, United States
Full-time
We're building the creative layer for modern communication.Every month, over a billion people make presentations — but the tools they use to make them haven't evolved in decades.We're changing that...Show more
Last updated: 30+ days ago • Promoted
Senior+ Site Reliability Engineer

Senior+ Site Reliability Engineer

Crusoe • San Francisco, CA, US
Full-time
Crusoe's mission is to accelerate the abundance of energy and intelligence.We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrif...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Happyrobot Inc. • San Francisco, CA, United States
Full-time
HappyRobot is the AI-native operating system for the real economy—a system that closes the circuit between intelligence and action. By combining real-time truth, specialized AI workers, and an orche...Show more
Last updated: 30+ days ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Gradle Technologies • San Francisco, CA, US
Full-time
Develocity is a first-of-its-kind toolchain observability and acceleration platform that helps software teams adopt and improve DORA capabilities (including continuous delivery) in order to achieve...Show more
Last updated: 17 days ago • Promoted