Talent.com
No longer accepting applications
Site Reliability Engineer

Site Reliability Engineer

Runloop AISan Francisco, California, United States
17 hours ago
Job type
  • Full-time
Job description

About Runloop

Runloop is building the foundational infrastructure for the next generation of AI development. We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxes. Our platform enables teams to experiment, iterate, and deploy their projects without the friction of environment setup and dependencies. We are a small but mighty team dedicated to building a rock-solid platform that empowers innovation.

The Role

We’re looking for a skilled and passionate Site Reliability Engineer to join our team. As an SRE, you’ll be responsible for the reliability, observability, performance, and security of our core platform—the very foundation on which our users build their futures. You’ll work closely with our engineering team to develop and maintain the systems that power our code sandboxes, ensuring a seamless and stable experience for our customers. This is a critical role that blends a deep understanding of operations with a software engineering mindset.

Responsibilities

Design and maintain our production infrastructure on cloud platforms like AWS, GCP, or Azure.

Monitor and respond to system alerts and incidents, ensuring high availability and a secure environment for our users’ code using Grafana and Prometheus.

Collaborate with developers to ensure new features and services are designed with scalability and reliability in mind.

Troubleshoot and resolve complex issues related to our infrastructure, networking, and the sandbox environment.

Participate in an on‑call rotation to support our production systems.

Define and track SLIs / SLOs, manage error budgets, and proactively monitor distributed systems with logging and tracing.

Automate deployments, scaling, provisioning, and recovery tasks to reduce toil and build self‑healing systems.

Lead incident response, conduct root‑cause analysis, and facilitate blameless post‑mortems to drive continual improvement.

Collaborate cross‑functionally with product, engineering, and developer relations to ensure reliable releases and an outstanding developer experience.

Plan for capacity growth, forecast system usage, and contribute to safe release and change management processes.

Mentor and support front‑end developers in building reliable distributed front‑end systems (CDNs, caching, client‑side observability).

Qualifications

Proven experience as an SRE, DevOps Engineer, or similar role.

Strong programming skills in languages like Python or Go.

Deep expertise in containerization technologies such as Docker and Kubernetes.

Experience with cloud infrastructure and tools like Terraform and / or Pulumi.

Familiarity with monitoring and alerting tools like Prometheus, Grafana, or Datadog.

A solid understanding of networking, security, and Linux systems administration.

Experience designing, scaling, and maintaining distributed systems (backend platforms, APIs, or front‑end infrastructure).

Proficiency in implementing observability frameworks (metrics, logging, tracing) and aligning reliability goals with developer velocity. Hands‑on experience managing incidents, running on‑call operations, and producing actionable post‑mortems.

Ability to mentor engineers and influence reliability practices across teams, especially for front‑end infrastructure and performance.

Bonus Points

Experience with chaos engineering techniques, front‑end observability tools (e.g., Sentry, RUM, synthetic monitoring), or building CI / CD pipelines for front‑end delivery.

Benefits

Competitive salary and equity.

Comprehensive health, dental, and vision insurance for you and your dependents.

Opportunity to work on cutting‑edge AI technology and make a real impact on the future of software engineering.

Free lunch and snacks.

Location

In office 4 days a week in San Francisco, optional 1 day a week WFH.

Join Us

If you’re excited about shaping the future of AI‑driven software engineering and empowering developers to build the next generation of coding tools, we want to hear from you. Join Runloop and be at the forefront of the AI revolution in software development.

Runloop is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability status, protected veteran status, sexual orientation, gender identity, or any other characteristic protected by law.

#J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • San Francisco, California, United States

Related jobs
  • Promoted
Principal Site Reliability Engineer

Principal Site Reliability Engineer

FortinetSanta Clara, CA, United States
Full-time
At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, and obsess...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

PsiQuantumPalo Alto, CA, United States
Full-time
Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer - Inference

Site Reliability Engineer - Inference

LambdaSan Francisco, California, United States
Full-time
In 2012, Lambda started with a crew of AI engineers publishing research at top machine-learning conferences.We began as an AI company built by AI engineers. Today, we're on a mission to be the world...Show moreLast updated: 30+ days ago
  • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

HiveSan Francisco, California, United States
Full-time
Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

PsiquantumPalo Alto, California, United States
Full-time
Quantum computing holds the promise of humanity’s mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

XaiPalo Alto, California, United States
Full-time
AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

BasetenSan Francisco, California, United States
Full-time
We’re a growing team of builders backed by top-tier investors, including.ML teams at enterprises and category-defining AI-native companies like. Baseten to power their core production workloads with...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

ZooxFoster City, California, United States
Full-time
Zoox is looking for a platform / site reliability engineer who will be responsible for measuring and maintaining the uptime of the many services critical to the development process for autonomous veh...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

NatcastSunnyvale, California, United States
Full-time
Natcast (short for The National Center for the Advancement of Semiconductor Technology) is a new, purpose-built, non-profit entity created to operate the National Semiconductor Technology Center (N...Show moreLast updated: 30+ days ago
  • Promoted
Sr. Site Reliability Engineer

Sr. Site Reliability Engineer

Pure StorageSanta Clara, California, United States
Full-time
We’re in an unbelievably exciting area of tech and are fundamentally reshaping the data storage industry.Here, you lead with innovative thinking, grow along with us, and join the smartest team in t...Show moreLast updated: 30+ days ago
  • Promoted
Sr. Site Reliability Engineer

Sr. Site Reliability Engineer

ProsperSan Francisco, California, United States
Full-time
As a Senior Site Reliability Engineer (SRE) at Prosper, you will be instrumental in enhancing the reliability, scalability, and maintainability of our technology platform.This role bridges the gap ...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

WorkosSan Francisco, California, United States
Remote
Full-time
WorkOS builds tools and services for developers to help them implement authentication, identity, authorization, and overall enterprise readiness. We’re a fully distributed team with employees across...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

Redwood MaterialsSan Francisco, California, United States
Full-time
Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling .Responsibilities will include : . Collect business & technical requirements and work wit...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

CheckrSan Francisco, California, United States
Full-time
Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

ReplitFoster City, California, United States
Full-time
Replit is the fastest way to turn ideas into software.With our powerful AI-powered Agent and Assistant, anyone can create and launch apps from natural language in just one click.Build and deploy fu...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer - Supercomputing

Site Reliability Engineer - Supercomputing

XaiPalo Alto, California, United States
Full-time
AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show moreLast updated: 30+ days ago
  • Promoted
Lead Site Reliability Engineer

Lead Site Reliability Engineer

VisaFoster City, California, United States
Full-time
Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show moreLast updated: 30+ days ago
  • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

CheckrSan Francisco, California, United States
Full-time
Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...Show moreLast updated: 30+ days ago