Talent.com
Staff Site Reliability Engineer
Staff Site Reliability EngineerElios Talent • San Francisco, CA, United States
Staff Site Reliability Engineer

Staff Site Reliability Engineer

Elios Talent • San Francisco, CA, United States
16 days ago
Job type
  • Full-time
Job description

Overview

Title : Staff Site Reliability Engineer

Location : Flexible / Remote

Employment Type : Full-Time

Compensation : $135,000 – $220,000

Role Summary

We are seeking a Staff Site Reliability Engineer (SRE) to ensure the availability, scalability, and performance of mission-critical systems. You will design disaster recovery processes, implement observability and alerting frameworks, and lead incident response efforts. This role combines system design expertise with a focus on automation, empowering teams to operate large-scale distributed environments efficiently and securely.

Key Responsibilities

  • Design and maintain highly available, large-scale distributed systems.
  • Lead disaster recovery planning, execution, and continuous improvement.
  • Implement observability, monitoring, and alerting solutions.
  • Drive incident response, root cause analysis, and post-mortem reviews.
  • Build automation tools to optimize system operations and reduce manual tasks.
  • Collaborate with engineering teams to embed reliability best practices.

Requirements

  • 6+ years of experience in Site Reliability Engineering or related roles.
  • Expertise in system design and distributed system architecture.
  • Proficiency in Go and Python for automation and tooling.
  • Strong knowledge of Kubernetes and container orchestration.
  • Experience with observability tools (monitoring, logging, and tracing).
  • Proven ability to lead incident response and drive reliability culture.
  • About the Opportunity

    This role is ideal for an experienced engineer who thrives on ensuring reliability at scale. You will lead critical system initiatives, mentor teams, and implement automation to support resilient operations.

    Why Join

  • High-impact role at the intersection of reliability and scalability.
  • Competitive compensation and leadership visibility.
  • Opportunity to shape operational excellence and system resiliency.
  • #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • San Francisco, CA, United States

    Related jobs
    Site Reliability Engineer

    Site Reliability Engineer

    ConductorOne • San Francisco, CA, United States
    Full-time
    Shape the future of identity with the highest-caliber team.If you’re amazing at what you do and want to solve big challenges in identity and security, come on board. Identity is how companies are be...Show more
    Last updated: 18 days ago • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Altana AI • San Francisco, CA, United States
    Full-time
    AI can be a powerful tool for good in the world – at Altana we apply AI to the world’s largest organized body of supply chain data to power a more resilient, more secure, and more sustainable model...Show more
    Last updated: 30+ days ago • Promoted
    Senior / Staff Site Reliability Engineer, Storage

    Senior / Staff Site Reliability Engineer, Storage

    Fluidstack • San Francisco, CA, United States
    Full-time
    Fluidstack is building GPU supercomputers for top AI labs, governments, and enterprises.Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more. Our team is small, highly motivate...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    prosper.com • San Francisco, CA, United States
    Full-time
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...Show more
    Last updated: 13 days ago • Promoted
    Staff Site Reliability Engineer, Storage

    Staff Site Reliability Engineer, Storage

    Epoch Biodesign • San Francisco, CA, United States
    Full-time
    Crusoe is building the World’s Favorite AI-first Cloud infrastructure company.We’re pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to p...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantum • Palo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Crusoe • San Francisco, CA, United States
    Full-time
    Crusoe is building the World’s Favorite AI-first Cloud infrastructure company.We’re pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to p...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer, Fabric

    Staff Site Reliability Engineer, Fabric

    MongoDB • San Francisco, CA, United States
    Full-time
    Staff Site Reliability Engineer, Fabric.MongoDB’s mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data.We enable organizations ...Show more
    Last updated: 30+ days ago • Promoted
    Staff Engineer, Site Reliability

    Staff Engineer, Site Reliability

    Zapier • San Francisco, CA, United States
    Full-time
    Zapier is building a platform to help millions of businesses globally scale with automation and AI.Our mission is to make automation work for everyone by delivering products that delight our custom...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood Materials, Inc. • San Francisco, CA, United States
    Full-time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling—keeping critical minerals in circulation and driving the energy transition.Founded in 2...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Runloop AI • San Francisco, CA, United States
    Full-time
    Runloop is building the foundational infrastructure for the next generation of AI development.We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxe...Show more
    Last updated: 1 day ago • Promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    Prosper Marketplace • San Francisco, CA, United States
    Full-time
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...Show more
    Last updated: 9 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    WorkOS • San Francisco, CA, United States
    Full-time
    About WorkOS 🚀 WorkOS builds tools and services for developers to help them implement authentication, identity, authorization, and overall enterprise readiness. We’re a fully distributed team with ...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood Materials • San Francisco, CA, United States
    Full-time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling — keeping critical minerals in circulation and driving the energy transition.Founded in...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer, Compute

    Staff Site Reliability Engineer, Compute

    Crusoe • San Francisco, CA, United States
    Full-time
    Crusoe is building the World’s Favorite AI-first Cloud infrastructure company.We’re pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to p...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Fractal • San Francisco, CA, United States
    Full-time
    This range is provided by Fractal.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Fractal Analytics is a strategic AI partner to Fortune 500 com...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Primer • San Francisco, CA, United States
    Full-time
    Primer helps B2B products break out of the B2C-centric marketing box.Our platform turns consumer ad channels, data streams, and emerging AI workflows into measurable growth engines for go-to-market...Show more
    Last updated: 30+ days ago • Promoted
    Senior / Staff Reliability Engineer

    Senior / Staff Reliability Engineer

    Antora Energy • San Jose, CA, United States
    Full-time
    Antora builds and deploys thermal batteries to power always-on industrial operations with low-cost energy.Factory-built in the United States, Antora's modular thermal batteries deliver reliable hea...Show more
    Last updated: 30+ days ago • Promoted