Talent.com
Site Reliability Engineer
Site Reliability EngineerRethink recruit • San Francisco, CA, United States
No longer accepting applications
Site Reliability Engineer

Site Reliability Engineer

Rethink recruit • San Francisco, CA, United States
30+ days ago
Job type
  • Full-time
Job description

About Runloop

Runloop is building the foundational infrastructure for the next generation of AI development. We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxes. Our platform eliminates friction in environment setup and dependencies, enabling teams to experiment, iterate, and deploy seamlessly. We’re a small but dedicated team working to deliver a rock-solid platform that empowers innovation.

The Role

We’re looking for a skilled Site Reliability Engineer (SRE) to ensure the reliability, observability, performance, and security of our core platform—the foundation upon which our users build. You’ll work closely with engineering to maintain resilient systems that power our code sandboxes, while mentoring peers on reliability practices. This role blends deep operational expertise with a software engineering mindset.

What You’ll Do

  • Design, operate, and improve production infrastructure on AWS, GCP, or Azure.
  • Define and monitor SLIs / SLOs, manage error budgets, and maintain observability with Prometheus, Grafana, and logging / tracing frameworks.
  • Build automation for deployments, scaling, and recovery—reducing toil and creating self-healing systems.
  • Lead incident response, root‑cause analysis, and blameless post‑mortems.
  • Collaborate with developers to design scalable, reliable services.
  • Optimize distributed systems, networking, and sandbox performance.
  • Plan for capacity growth and support safe release / change management.
  • Mentor engineers on reliability and front‑end distributed systems (CDNs, caching, client observability).

Qualifications

  • Proven experience as an SRE, DevOps Engineer, or similar role.
  • Strong programming skills (Python or Go preferred).
  • Deep knowledge of containerization (Docker, Kubernetes).
  • Expertise in infrastructure-as-code (Terraform or Pulumi).
  • Strong understanding of networking, Linux, and system security.
  • Hands‑on experience with distributed systems and observability (metrics, logs, tracing).
  • Skilled in incident management, on‑call rotations, and post‑mortem processes.
  • Ability to mentor and influence best practices across teams.
  • Bonus Points

  • Experience with chaos engineering, CI / CD for front‑end delivery, or observability tools like Sentry, RUM, or synthetic monitoring.
  • Benefits

  • Competitive salary and equity.
  • Comprehensive health, dental, and vision insurance for you and your dependents.
  • Free lunch and snacks.
  • Opportunity to shape the future of AI‑driven software engineering in a high‑impact role.
  • Location

    On‑site in San Francisco, CA (in office 4 days / week, optional 1 day WFH).

    Join Us

    If you’re passionate about building resilient systems that empower developers and want to shape the future of AI‑driven software engineering, we’d love to hear from you. Join Runloop and help build the infrastructure that powers tomorrow’s AI.

    Runloop is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability status, protected veteran status, sexual orientation, gender identity, or any other characteristic protected by law.

    #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • San Francisco, CA, United States

    Similar jobs
    Site Reliability Engineer

    Site Reliability Engineer

    Mercor, Inc. • San Francisco, California, United States
    Full-time
    About Mercor Mercor is at the intersection of labor markets and AI research.We partner with leading AI labs and enterprises to provide the human intelligence essential to AI development.Our vast ta...Show more
    Last updated: 20 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Together • San Francisco, California, United States
    Full-time
    As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a soft...Show more
    Last updated: 20 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Attain • Redwood City, CA, United States
    Full-time
    Built for consumers and companies, alike.In a world driven by data, we believe consumers and businesses can coexist.Our founders had a vision to empower consumers to leverage their greatest asset—t...Show more
    Last updated: 30+ days ago • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Stuut • San Francisco, California, US
    Full-time
    Job Description Job Description Stuut is transforming accounts receivable for B2B companies—making collections smarter and faster for companies that have historically relied on manual processes t...Show more
    Last updated: 4 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    The Voleon Group • Berkeley, CA, United States
    Full-time
    Voleon is a technology company that applies state‑of‑the‑art AI and machine learning techniques to real‑world problems in finance. For nearly two decades, we have led our industry and worked at the ...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer - Platform

    Site Reliability Engineer - Platform

    CodeRabbit • San Francisco, CA, United States
    Full-time
    CodeRabbit is an innovative research and development company focused on building extraordinarily productive human‑machine collaboration systems. Our primary goal is to create the next generation of ...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Mercor • San Francisco, CA, United States
    Full-time
    Mercor is at the intersection of labor markets and AI research.We partner with leading AI labs and enterprises to provide the human intelligence essential to AI development.Our vast talent network ...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    VirtualVocations • Oakland, California, United States
    Full-time
    A company is looking for a Site Reliability Engineer (SRE) to join a dynamic Cloud Services team in a fully remote role.Key Responsibilities Act as a subject matter expert in cloud technologies, ...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Alembic Technologies • San Francisco, CA, United States
    Full-time
    Senior Site Reliability Engineer.This range is provided by Alembic Technologies.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.We’re looking fo...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer - Platform

    Senior Site Reliability Engineer - Platform

    Quizlet • San Francisco, CA, US
    Full-time
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, in...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Hamilton Barnes? • San Francisco, California, United States
    Full-time
    Direct message the job poster from Hamilton Barnes.Overview Join a stealth-mode startup building out their AI and cloud platform, powered by thousands of H100s, H200s, and B200s, ready to go for ex...Show more
    Last updated: 20 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Zoox • Foster City, CA, US
    Full-time
    Zoox is seeking a Site Reliability Engineer to help ensure the availability, performance, and resilience of the services that power the development and operation of our autonomous vehicles.In this ...Show more
    Last updated: 29 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    gamma.app • San Francisco, CA, United States
    Full-time
    We're building the creative layer for modern communication.Every month, over a billion people make presentations — but the tools they use to make them haven't evolved in decades.We're changing that...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Happyrobot Inc. • San Francisco, California, United States
    Full-time
    About HappyRobot HappyRobot is the AI-native operating system for the real economy—a system that closes the circuit between intelligence and action. By combining real-time truth, specialized AI work...Show more
    Last updated: 20 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Fractal • San Francisco, CA, United States
    Full-time
    This range is provided by Fractal.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Fractal Analytics is a strategic AI partner to Fortune 500 com...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineering

    Site Reliability Engineering

    Forhyre • San Francisco, California, US
    Full-time
    Job Description Job Description Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas of development and are interested in continuing to improve our...Show more
    Last updated: 4 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Flexton, Inc. • San Francisco, CA, United States
    Full-time
    Skill : You have excellent written and verbal communication skills.You have experience managing large websites or services within the context of a large scale web environment.You are able to execute...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    HappyRobot • San Francisco, California, United States
    Full-time
    About HappyRobot HappyRobot is the AI‑native operating system for the real economy—a system that closes the circuit between intelligence and action. By combining real‑time truth, specialized AI work...Show more
    Last updated: 20 days ago • Promoted