Site Reliability Engineer — Scale Resilience & ObservabilityHappyrobot Inc. • San Francisco, CA, United States

Site Reliability Engineer — Scale Resilience & Observability

Happyrobot Inc. • San Francisco, CA, United States

30+ days ago

Job type

Full-time

Job description

A leading AI startup based in San Francisco is seeking a Site Reliability Engineer to enhance operational resilience. In this role, you will oversee stability, observability, and debugging workflows, transforming complex failures into seamless operations. Ideal candidates have 3+ years in debugging production systems and are comfortable with coding in Python and Go. Excellent problem-solving skills and ability to communicate clearly under pressure are essential. Join a passionate team and help shape the future of AI-driven enterprises.

#J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer Scale Resilience Observability • San Francisco, CA, United States

Similar jobs

Senior Site Reliability Engineer

Mvp VC • San Francisco, CA, United States

Full-time

Loft Orbital is revolutionizing access to space by building reliable, shareable satellites that drastically reduce the time and complexity traditionally required to get to orbit.We operate satellit...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Attain • Redwood City, CA, United States

Full-time

Built for consumers and companies, alike.In a world driven by data, we believe consumers and businesses can coexist.Our founders had a vision to empower consumers to leverage their greatest asset—t...Show more

Last updated: 30+ days ago • Promoted

CloudDevs : Senior Site Reliability Engineer (SRE)

Breakout Tools • San Francisco, CA, United States

Full-time

CloudDevs works with fast-moving, venture-backed startups across the US.We’re building a pool of world-class Site Reliability Engineers for current roles and for upcoming opportunities.You will eit...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

Motive Software • San Francisco, CA, United States

Full-time

Senior Site Reliability Engineer.Let’s face it, a company whose mission is human transformation better have some fresh thinking about the employer / employee relationship. We can’t cram it all in here...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer – Platform

Icon Ventures • San Francisco, CA, United States

Full-time

At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.We blend cognitive science with machine learning to personalize and enhance the lear...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Together AI • San Francisco, CA, United States

Full-time

As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a soft...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Mercor • San Francisco, CA, United States

Full-time

Mercor is at the intersection of labor markets and AI research.We partner with leading AI labs and enterprises to provide the human intelligence essential to AI development.Our vast talent network ...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

Alembic Technologies • San Francisco, CA, United States

Full-time

Senior Site Reliability Engineer.This range is provided by Alembic Technologies.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.We’re looking fo...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Canonical • San Francisco, CA, United States

Full-time

Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiat...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

ZetaChain • San Francisco, CA, United States

Full-time

We're building something ambitious at ZetaChain : the first universal blockchain and AI platform that connects everything—Bitcoin, Ethereum, Solana, and more—while pioneering in the GenAI space.We'r...Show more

Last updated: 24 days ago • Promoted

Site Reliability Engineer - Scale & Observability

gamma.app • San Francisco, CA, United States

Full-time

A dynamic tech firm located in San Francisco is seeking a Site Reliability Engineer to enhance operational health across their production systems. This high-impact role demands expertise in AWS and ...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

Hive • San Francisco, CA, United States

Full-time

Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Fractal • San Francisco, CA, United States

Full-time

This range is provided by Fractal.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Fractal Analytics is a strategic AI partner to Fortune 500 com...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Programmers.io • San Francisco, CA, United States

Full-time

Quick Apply

Okta : IAM / IGA, OAuth / SAML / OIDC, MFA / FIDO and Zero Trust principles.Azure : Entra ID, AKS, ARM templates, Azure Monitor, Policy GCP : GAM, Cloud Operations Suite, Deployment ManagerShow more

Last updated: 1 day ago

Site Reliability Engineer

Bay Systems Consulting • Berkeley, CA, United States

Temporary

Site Reliability Engineer (SRE) role at Bay Systems Consulting.Location : Berkeley, CA (Onsite at Lawrence Berkeley National Laboratory). Employment Type : 5–6 Month Contract (Extension Possible).Pay ...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

Unify • San Francisco, CA, United States

Full-time

Unify was founded January 17th, 2023 by Austin Hughes and Connor Heggie.Prior to Unify, Austin led Ramp’s growth product team focused on new customer acquisition, and Connor was a machine learning ...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer (SRE)

Baseten • San Francisco, CA, United States

Full-time

Baseten powers inference for the world's most dynamic AI companies, like OpenEvidence, Clay, Mirage, Gamma, Sourcegraph, Writer, Abridge, Bland, and Zed. By uniting applied AI research, flexible inf...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Clay • San Francisco, CA, United States

Full-time

Clay is a creative tool for growth.Our mission is to help businesses grow — without huge investments in tooling or manual labor. We’re already helping over 100,000 people grow their business with Cl...Show more

Last updated: 30+ days ago • Promoted