Talent.com
Senior Site Reliability Engineer
Senior Site Reliability EngineerGradle Technologies • San Francisco, CA, US
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Gradle Technologies • San Francisco, CA, US
1 day ago
Job type
  • Full-time
Job description

Job Description

Job Description

Who We Are

Develocity is a first-of-its-kind toolchain observability and acceleration platform that helps software teams adopt and improve DORA capabilities (including continuous delivery) in order to achieve software delivery excellence. It combines build and test acceleration with deep observability for builds and tests with Gradle Build Tool, Apache Maven™, sbt, npm, and Python, and applies to both CI and local builds and tests. Ultimately, Develocity provides an operational layer across an organization's toolchains to speed up, troubleshoot, and optimize local developer and remote CI feedback loops.

Our software is used by some of the world's leading software organizations, such as Netflix, Airbnb, SAP, several top ten banks, and many other major customers across all verticals. We regularly collaborate with these and other users to make our products continuously better.

We have partnered with the Apache Software Foundation, the Commonhaus Foundation, the Scala Center, the Micronaut Foundation, and other OSS projects like Spring, Quarkus, Kotlin, JUnit, AndroidX, and many more to bring the values of Develocity also to the OSS Community.

Our Values

Seek to Understand : Everything starts with listening and understanding, and we strive to understand different viewpoints, problems, and motivations. Before we take action, we ensure we truly grasp the challenges, perspectives, and goals.

Know the Why : We approach our work with a clear sense of purpose, ensuring every step is deliberate and focused. We take meaningful action with urgency, but never at the expense of thoughtful consideration.

Innovate & Iterate : We embrace challenges and are not afraid to try new things, even if they might fail. With deep understanding and a clear purpose, we can develop creative and bold solutions to tackle challenges.

Own the Outcome : We are empowered to take initiative and we maintain transparency in our work and its outcomes. When we execute, we take responsibility for our decisions, measure the success of our innovations, and learn from the results.

Who You Are

We're building a new SRE team and looking for founding members to help shape how we operate. You'll be responsible for the reliability, performance, and availability of Develocity instances serving paying customers, open-source projects, and public-facing services, plus supporting infrastructure like artifact registries.

You'll work on our internally-built Cloud Application Platform, Kubernetes on AWS, and develop deep expertise in it. When incidents happen, you'll troubleshoot issues across the stack, from application to infrastructure. You'll collaborate with the Cloud Platform team to improve the tooling you depend on, and with engineering teams to build reliability into how we ship software. If you like automating things and hate doing the same task twice, you'll fit in well.

You'll be part of a distributed, remote-first team that values asynchronous communication and written documentation. Strong self-direction and clear communication across time zones are essential.

Responsibilities

  • Operate and maintain all Develocity instances and supporting services.
  • Participate in a follow-the-sun on-call rotation, owning incident response and troubleshooting issues across the stack.
  • Drive automation across application deployment, upgrades, monitoring, self-healing, and recovery.
  • Build and maintain observability for all managed services (logging, metrics, tracing, and alerting).
  • Work with engineering teams to build reliability into features from the start.
  • Run incident response and retrospectives, and make sure we learn from them.
  • Own disaster recovery, backups, and business continuity.
  • Communicate with customers during incidents and maintenance windows.
  • Optimize performance, resource usage, and costs.
  • Help evolve our SaaS operations as we grow.

Minimum qualifications

  • 5+ years in SRE, DevOps, or equivalent role operating production services at scale.
  • Strong Kubernetes experience in production environments.
  • Cloud infrastructure expertise, preferably AWS (EKS, RDS, S3, EC2).
  • Proficiency with observability tools (Prometheus, Grafana) and Infrastructure as Code (Terraform).
  • Track record of incident management and response.
  • Knowledge of SRE best practices (SLAs, SLOs).
  • Scripting proficiency (Python, Bash) for automation.
  • Experience with 24 / 7 on-call rotations.
  • Strong written and verbal English communication.
  • Preferred qualifications

  • Experience operating SaaS platforms at scale.
  • Familiarity with Develocity.
  • JVM language experience (Java, Kotlin).
  • Disaster recovery planning and execution experience.
  • Customer-facing incident communication skills.
  • Experience establishing SRE practices in new or growing teams.
  • What We Offer

  • A ground-floor role in a new SRE team—you'll shape how we do things, not inherit someone else's decisions.
  • Real ownership of production systems used by engineers at companies you've heard of.
  • Direct interaction with customers when things go wrong (and when they go right).
  • A culture that values automation over heroics.
  • In-person meetings, such as our annual company offsite and team meetings.
  • Work from home in a remote-first environment.
  • Competitive salaries and equity grants.
  • Compensation

    The US salary range for this position is $150-190k which reflects the target ranges for all US locations. Within this range, individual pay is determined by geographic location and additional factors including but not limited to experience, relevant skills, qualifications, seniority, performance, and travel requirements. Our recruiting team can share more information about the specific salary range for your location during the hiring process.

    Location

  • Remote from anywhere in PST timezone.
  • While our team works remotely and is spread across the globe, we deeply value daily interactions and collaboration.
  • Create a job alert for this search

    Senior Site Reliability Engineer • San Francisco, CA, US

    Similar jobs
    Senior Site Reliability Engineer - Networking

    Senior Site Reliability Engineer - Networking

    Lambda Inc. • San Francisco, CA, United States
    Full-time
    Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serving tens of thousands of customers.Our customers range from AI researchers to enterprises and hyperscalers.Lambda's m...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer US - San Francisco

    Site Reliability Engineer US - San Francisco

    Near Inc. • San Francisco, CA, United States
    Full-time
    The NEAR AI engineering team is developing decentralized and confidential machine learning infrastructure to power user owned AI. We currently focus on building infrastructure to enable private and ...Show more
    Last updated: 7 days ago • Promoted
    Senior Site Reliability Engineer – Platform

    Senior Site Reliability Engineer – Platform

    Icon Ventures • San Francisco, CA, United States
    Full-time
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.We blend cognitive science with machine learning to personalize and enhance the lear...Show more
    Last updated: 30+ days ago • Promoted
    Senior Technology Site Reliability Engineer

    Senior Technology Site Reliability Engineer

    Cooley LLP • San Francisco, CA, United States
    Full-time
    Senior Technology Site Reliability Engineer page is loaded## Senior Technology Site Reliability Engineerlocations : San Francisco : New York : Santa Monica : Los Angeles : Palo Altotime type : ...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer Hybrid - San Francisco

    Site Reliability Engineer Hybrid - San Francisco

    Grammarly, Inc. • San Francisco, CA, United States
    Full-time
    Superhuman offers a dynamic hybrid working model for this role.This flexible approach gives team members the best of both worlds : plenty of focus time along with in-person collaboration that helps ...Show more
    Last updated: 2 days ago • Promoted
    Senior Site Reliability Engineer : Scale & Reliability

    Senior Site Reliability Engineer : Scale & Reliability

    Google Inc. • San Francisco, CA, United States
    Full-time
    A leading technology firm in San Francisco is seeking a Software Engineer III for site reliability engineering.This full-time role requires a Bachelor's degree in Computer Science and at least two ...Show more
    Last updated: 6 days ago • Promoted
    Software Engineer, Site Reliability (SRE)

    Software Engineer, Site Reliability (SRE)

    Sierra • San Francisco, CA, United States
    Full-time
    At Sierra, we’re creating a platform to help businesses build better, more human customer experiences with AI.We are primarily an in-person company based in San Francisco, with growing offices in A...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    BetterUp • San Francisco, CA, United States
    Full-time
    Let’s face it, a company whose mission is human transformation better have some fresh thinking about the employer / employee relationship. We can’t cram it all in here, but you’ll start noticing it fr...Show more
    Last updated: 1 day ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Motive Software • San Francisco, CA, United States
    Full-time
    Senior Site Reliability Engineer.Let’s face it, a company whose mission is human transformation better have some fresh thinking about the employer / employee relationship. We can’t cram it all in here...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer, Site Reliability Engineer (SRE)

    Software Engineer, Site Reliability Engineer (SRE)

    Harvey • San Francisco, California, United States
    Full-time
    Harvey is a secure AI platform for legal and professional services that augments productivity and automates complex workflows. Harvey uses algorithms with reasoning-adept LLMs that have been customi...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer (SRE) - AI Infrastructure

    Site Reliability Engineer (SRE) - AI Infrastructure

    Hamilton Barnes Associates Limited • San Francisco, CA, United States
    Full-time
    Are you looking for an exciting new opportunity?.Join a stealth-mode hyperscale data center startup building a next-generation AI and cloud platform designed for startups and advanced research, pow...Show more
    Last updated: 30+ days ago • Promoted
    On-Site SRE (SF) : Kubernetes, Terraform, 99.9% Uptime

    On-Site SRE (SF) : Kubernetes, Terraform, 99.9% Uptime

    Latent, Inc. • San Francisco, CA, United States
    Full-time
    A leading clinical AI company in San Francisco is seeking a Site Reliability Engineer (SRE) to own and enhance their production environment. This role encompasses responsibilities like designing inf...Show more
    Last updated: 30+ days ago • Promoted
    Reliability Engineer

    Reliability Engineer

    Periodiclabs • Menlo Park, CA, United States
    Full-time
    We are an AI + physical sciences lab building state of the art models to make novel scientific discoveries.We are well funded and growing rapidly. Team members are owners who identify and solve prob...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Alembic Technologies • San Francisco, CA, United States
    Full-time
    Senior Site Reliability Engineer.This range is provided by Alembic Technologies.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.We’re looking fo...Show more
    Last updated: 30+ days ago • Promoted
    Founding SRE Engineer – Reliability & Growth

    Founding SRE Engineer – Reliability & Growth

    Asana • San Francisco, CA, United States
    Full-time
    A leading software company is seeking experienced Software Engineers to join the new Site Reliability Engineering team.This role focuses on building reliable, scalable systems and leading projects ...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Mvp VC • San Francisco, CA, United States
    Full-time
    Loft Orbital is revolutionizing access to space by building reliable, shareable satellites that drastically reduce the time and complexity traditionally required to get to orbit.We operate satellit...Show more
    Last updated: 14 days ago • Promoted
    Site Reliability Engineer III

    Site Reliability Engineer III

    Veeam • San Francisco, CA, United States
    Full-time
    Veeam, the #1 global market leader in data resilience, believes businesses should control all their data whenever and wherever they need it. Veeam provides data resilience through data backup, data ...Show more
    Last updated: 2 days ago • Promoted
    Founding Site Reliability Engineer

    Founding Site Reliability Engineer

    Assort Health Inc. • San Francisco, CA, United States
    Full-time
    Our mission is to make exceptional healthcare accessible anytime, anywhere, for everyone.At Assort Health, we believe healthcare should feel effortless and connected — quick answers, clear communic...Show more
    Last updated: 30+ days ago • Promoted