Founding Site Reliability Engineer (Remote - US)Jobgether • San Francisco, CA, United States

Founding Site Reliability Engineer (Remote - US)

Jobgether • San Francisco, CA, United States

10 hours ago

Job type

Full-time

Remote

Job description

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Founding Site Reliability Engineer in the United States .

This is a unique opportunity to join a rapidly growing AI company as the first SRE hire in the San Francisco office. In this role, you will define and scale the Site Reliability Engineering discipline, ensuring the platform is reliable, secure, and performant at enterprise scale. You will work closely with engineering leads, product teams, and company founders to build infrastructure, establish best practices, and drive the organization’s reliability culture. The role involves hands‑on system design, automation, and observability work, while providing leadership and strategic input to shape long‑term operational excellence. Ideal candidates are technically strong, highly collaborative, and motivated by building world‑class systems from the ground up.

Accountabilities

Establish and scale the SRE discipline , including best practices, tooling, and culture.
Ensure 99.9% uptime of production systems and maintain global platform reliability.
Architect, automate, and manage AWS infrastructure using Terraform, CI / CD pipelines, and Infrastructure as Code.
Design and implement observability systems across microservices, APIs, and vector workloads, including metrics, tracing, and logging.
Lead incident management , reducing MTTR through runbooks, alerts, and postmortems.
Collaborate with engineering teams to embed reliability principles into the software development lifecycle.
Influence organizational strategy and culture as a founding voice in the engineering team.

Qualifications

5+ years of experience in SRE, DevOps, or infrastructure roles, ideally in enterprise SaaS environments.

Expertise in AWS services (EC2, ECS / EKS, Lambda, RDS, VPC, IAM).

Proven experience with Infrastructure as Code (Terraform, Kubernetes / EKS, CDK, or CloudFormation).

Hands‑on experience with observability and monitoring stacks (CloudWatch, Grafana, Prometheus, Datadog).

Experience in incident management, on‑call responsibilities, and postmortem‑driven reliability improvements.

Bonus : exposure to AI / ML platforms, data‑heavy systems, or multi‑agent workloads.

Strong problem‑solving, communication, and collaboration skills.

Benefits

Competitive salary and equity options.

Health, dental, and vision insurance, including dependents coverage.

Paid time off and holidays, with parental leave benefits.

401(k) plan and other financial perks.

Opportunity to shape company culture and systems at a high‑growth AI startup.

Thank you for your interest!

#J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • San Francisco, CA, United States

Related jobs

Site Reliability Engineer

Fortinet • Sunnyvale, CA, United States

Full-time

At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, and obsess...Show more

Last updated: 1 day ago • Promoted

Site Reliability Engineer

ConductorOne • San Francisco, CA, United States

Full-time

Shape the future of identity with the highest-caliber team.If you’re amazing at what you do and want to solve big challenges in identity and security, come on board. Identity is how companies are be...Show more

Last updated: 9 days ago • Promoted

Site Reliability Engineer - SRE at Descope Los Altos, CA

Itlearn360 • Los Altos, CA, United States

Full-time

Site Reliability Engineer - SRE job at Descope.Descope R&D group is a skilled team of developers with a unique DNA of creativity,flexibility,anopen mindset. We are looking for a passionate SRE to jo...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Latent • San Francisco, CA, United States

Full-time

Latent is building the intelligence infrastructure for American healthcare.Our products are already helping hospitals and clinics dramatically increase workflow output, speed up patient access to m...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Bits to Atoms • San Francisco, CA, United States

Full-time

Site Reliability Engineer (SRE).You’ll work at the intersection of infrastructure, AI / ML systems, and mission-critical physical operations. You’ll collaborate directly with engineering, AI, and oper...Show more

Last updated: 23 days ago • Promoted

Site Reliability Engineer

Redwood Materials, Inc. • San Francisco, CA, United States

Full-time

Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling—keeping critical minerals in circulation and driving the energy transition.Founded in 2...Show more

Last updated: 28 days ago • Promoted

Site Reliability Engineer

Sigmaways Inc • San Francisco, CA, United States

Full-time

As a Site reliability engineer, you will partner with development and IT teams to implement CI / CD pipelines, develop automation and monitoring solutions to ensure our platforms are secure, scalable...Show more

Last updated: 9 hours ago • Promoted • New!

Founding Site Reliability Engineer

Reducto • San Francisco, CA, United States

Full-time

Nearly 80% of enterprise data is in unstructured formats like PDFs.PDFs are the status quo for enterprise knowledge in nearly every industry. Reducto helps extract data from complex documents, enabl...Show more

Last updated: 21 days ago • Promoted

Site Reliability Engineer

WorkOS • San Francisco, CA, United States

Full-time

About WorkOS 🚀 WorkOS builds tools and services for developers to help them implement authentication, identity, authorization, and overall enterprise readiness. We’re a fully distributed team with ...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Together AI • San Francisco, CA, United States

Full-time

As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a soft...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer - West Coast

Zapier • San Francisco, CA, United States

Full-time

We're humans who simply think computers should do more work.At Zapier, we’re not just making software—we’re building a platform to help millions of businesses globally scale with automation and AI....Show more

Last updated: 2 days ago • Promoted

Site Reliability Engineer

Redwood Materials • San Francisco, CA, United States

Full-time

Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling — keeping critical minerals in circulation and driving the energy transition.Founded in...Show more

Last updated: 26 days ago • Promoted

Site Reliability Engineer

Primer • San Francisco, CA, United States

Full-time

Primer helps B2B products break out of the B2C-centric marketing box.Our platform turns consumer ad channels, data streams, and emerging AI workflows into measurable growth engines for go-to-market...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Writemed • San Francisco, CA, United States

Full-time

Would you like to join one of the fastest-growing organizations with a goal of using the latest AI, GenAI, LLM, Cloud, and Digital Technologies to advance drug development and improve patient care ...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Fractal • San Francisco, CA, United States

Full-time

This range is provided by Fractal.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Fractal Analytics is a strategic AI partner to Fortune 500 com...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Signify Technology • Palo Alto, CA, United States

Full-time

Competitive, based on experience.We are a technology startup advancing healthcare with a safety-focused AI platform that assists medical professionals by managing patient communications, including ...Show more

Last updated: 2 days ago • Promoted

Founding Site Reliability Engineer

Relevance AI • San Francisco, CA, United States

Full-time

San Francisco, USA (Hybrid 3 days / week).At Relevance AI, our mission is to empower anyone to delegate work to the AI workforce. We’re building a new category of AI automation, enabling teams to crea...Show more

Last updated: 5 days ago • Promoted

Site Reliability Engineer - Inference

Jobright.ai • San Francisco, CA, United States

Full-time

Site Reliability Engineer - Inference.Be among the first 25 applicants.Site Reliability Engineer - Inference.Get AI-powered advice on this job and more exclusive features.Jobright is an AI-powered ...Show more

Last updated: 30+ days ago • Promoted