Senior Site Reliability EngineerAlembic • San Francisco, CA, United States

Senior Site Reliability Engineer

Alembic • San Francisco, CA, United States

1 day ago

Job type

Full-time

Job description

About the Role

We’re looking for an experienced Site Reliability Engineer (SRE) to help us scale our platform with reliability, observability, and operational excellence at the core. You’ll partner with engineers and data scientists to build, automate, and maintain the infrastructure that powers our core platform—including data pipelines, ML workloads, and real‑time analytics systems.

This is a hands‑on, high‑impact role with visibility across the stack and the opportunity to shape the future of our infrastructure and operations.

Key Responsibilities

Design, build, and maintain scalable infrastructure to support real‑time analytics and machine learning workloads

Improve system reliability and performance through automation, observability, and proactive capacity planning

Own and evolve CI / CD pipelines, deployment automation, rollback mechanisms, and config management

Implement and maintain monitoring, alerting, and incident response processes (SLOs, runbooks, on‑call rotations)

Collaborate across engineering and data science teams to drive a culture of performance and reliability

Ensure security, compliance, and operational readiness across our cloud infrastructure

Drive post‑incident analysis and continuous improvement initiatives

Must‑Have Qualifications

8+ years of experience in SRE, DevOps, or infrastructure engineering roles

Deep experience with cloud environments (AWS preferred), containerization (Docker), and orchestration (Kubernetes)

Solid understanding of infrastructure‑as‑code (e.g., Terraform, Ansible)

Strong knowledge of Linux systems, networking, and systems performance tuning

Experience with monitoring and observability stacks (e.g., Prometheus, Grafana, Datadog, ELK, OpenTelemetry)

Proficiency with CI / CD tools and pipelines (e.g., GitHub Actions, ArgoCD, etc.)

Ability to debug complex systems and automate solutions in scripting languages (Python, Bash, etc.)

Excellent communication skills and the ability to work cross‑functionally

Nice‑to‑Have

Experience supporting data‑intensive platforms (Spark, Airflow, Kafka, etc.)

Familiarity with security practices for cloud‑native applications and infrastructure

Experience in high‑compliance or SOC‑2 environments

What You’ll Get

Ownership of mission‑critical infrastructure in a company solving real‑world enterprise problems

A front‑row seat to a high‑performance engineering culture

The ability to influence how our platform scales—from deployment to incident management

An environment that values curiosity, accountability, and impact

#J-18808-Ljbffr

Create a job alert for this search

Senior Site Reliability Engineer • San Francisco, CA, United States

Related jobs

Senior Site Reliability Engineer

Chainlink Labs • San Francisco, CA, United States

Full-time

Chainlink Labs is the primary contributing developer of Chainlink, the decentralized computing platform powering the verifiable web. Chainlink is the industry-standard platform for providing access ...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

ConductorOne • San Francisco, CA, United States

Full-time

ConductorOne is the first AI-native identity security platform that protects every identity : human, non-human, and AI.With powerful automation, platform-level AI, and out-of-the-box connectors, it ...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

Canonical • San Francisco, CA, United States

Full-time

Senior Site Reliability Engineer.Location : Globally remote role.Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets.Our pla...Show more

Last updated: 13 days ago • Promoted

Senior Site Reliability Engineer – Platform

Icon Ventures • San Francisco, CA, United States

Full-time

At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.We blend cognitive science with machine learning to personalize and enhance the lear...Show more

Last updated: 10 days ago • Promoted

Site Reliability Engineer I

Prosper • San Francisco, CA, United States

Full-time

As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Latent • San Francisco, CA, United States

Full-time

Location : San Francisco, CA (5 Days In-Office).You are the infrastructure expert who enables our rapid product development and guarantees. AI platform for major health systems.Your focus on operatio...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer, Storage

Epoch Biodesign • San Francisco, CA, United States

Full-time

Crusoe Energy is on a mission to unlock value in stranded energy resources through the power of computation.Take a look at what we do! - https : / / www. We aim to align the long term interests of the c...Show more

Last updated: 30+ days ago • Promoted

Senior / Staff Site Reliability Engineer

Fluidstack • San Francisco, CA, United States

Full-time

At Fluidstack, we’re building the infrastructure for abundant intelligence.We partner with top AI labs, governments, and enterprises - including Mistral, Poolside, Black Forest Labs, Meta, and more...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer I

Prosper Marketplace • San Francisco, CA, United States

Full-time

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

Alembic Technologies • San Francisco, CA, United States

Full-time

Senior Site Reliability Engineer.This range is provided by Alembic Technologies.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.We’re looking fo...Show more

Last updated: 9 days ago • Promoted

Senior Site Reliability Engineer

Checkr • San Francisco, CA, United States

Full-time

Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...Show more

Last updated: 10 days ago • Promoted

Senior Site Reliability Engineer

Loft Orbital • San Francisco, CA, United States

Full-time

Loft Orbital is revolutionizing access to space by building reliable, shareable satellites that drastically reduce the time and complexity traditionally required to get to orbit.We operate satellit...Show more

Last updated: 30+ days ago • Promoted

Senior / Staff Site Reliability Engineer

Crusoe • San Francisco, CA, United States

Full-time

Crusoe Energy is on a mission to unlock value in stranded energy resources through the power of computation.We aim to align the long term interests of the climate with the future of global computin...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

Hive • San Francisco, CA, United States

Full-time

Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

AppOmni • San Francisco, CA, United States

Full-time

AppOmni, a leader in SaaS Security, helps customers achieve secure productivity with their applications.Security teams and owners can quickly detect and mitigate threats using unmatched depth of pr...Show more

Last updated: 13 days ago • Promoted

Site Reliability Engineer

Fractal • San Francisco, CA, United States

Full-time

This range is provided by Fractal.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Fractal Analytics is a strategic AI partner to Fortune 500 com...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

Circle • San Francisco, CA, United States

Full-time

Senior Site Reliability Engineer at Circle.Circle is a financial technology company at the epicenter of the emerging internet of money. Our infrastructure—including USDC, a blockchain‑based dollar—h...Show more

Last updated: 30+ days ago • Promoted

Senior / Principal Site Reliability Engineer

Datacrunch • San Francisco, CA, United States

Full-time +1

Imagine a future where everyone has instant, low-cost access to intelligence.We’re building a fully featured European AI cloud - with everything one needs to train, experiment with, and deploy AI m...Show more

Last updated: 24 days ago • Promoted