Talent.com
Staff Reliability Engineer
Staff Reliability EngineerCelonis • Redwood City, CA, United States
Staff Reliability Engineer

Staff Reliability Engineer

Celonis • Redwood City, CA, United States
30+ days ago
Job type
  • Full-time
Job description

We're Celonis, the global leader in Process Mining technology and one of the world's fastest-growing SaaS firms. We believe there is a massive opportunity to unlock productivity by placing data and intelligence at the core of business processes - and for that, we need you to join us.

The Team

As a member of our Reliability Engineering team, you will play a critical role in ensuring the health, performance, and resilience of our platform. The team applies advanced software engineering and Site Reliability Engineering (SRE) principles to drive system reliability, scalability, and operational excellence across the organization.

The Role

  • Join a highly technical, collaborative, and innovation-driven team that blends Site Reliability Engineering with modern Software Engineering practices to build resilient and scalable systems.
  • Lead reliability efforts for a fleet of 80+ FedRAMP-compliant microservices running on Kubernetes, applying SRE principles to drive observability, automation, and incident prevention.
  • Develop and enforce SLOs, SLAs, and error budgets to drive reliability-focused development.
  • Provide mentorship and technical leadership across the SRE and engineering teams.
  • Own high-priority application incident escalations, performing deep technical analysis and restoration within defined SLOs, while continuously improving detection and response mechanisms.
  • Engineer solutions to enhance the availability, latency, and performance of production services—automating manual processes to eliminate toil and scale operational efficiency.
  • Collaborate closely with platform and application engineering teams to conduct post-incident reviews, extract insights, and implement systemic changes that improve overall reliability.

The qualifications you need :

  • Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related technical field (or equivalent hands-on experience).
  • Minimum of 8+ years of experience in software engineering or SRE roles.
  • Deep experience with cloud platforms (AWS, GCP, or Azure).
  • Proficiency in Java, the Spring framework, and Python (or a similar scripting language) in a Linux environment.
  • Prior experience contributing to Site Reliability Engineering initiatives or similar operational roles.
  • Demonstrated ability to lead projects and influence engineering culture.
  • Knowledge of SRE principles, including SLI / SLO design, error budgets, and toil reduction strategies.
  • Excellent written and verbal communication skills in English.
  • Please note : This position is not eligible for immigration visa sponsorship, now or in the future.
  • Nice to Have

  • Experience with observability and monitoring tools (e.g., Datadog, etc.).
  • Experience in developing and operating production-grade, scalable services using Kubernetes and elastic cloud architectures.
  • Experience with CI / CD pipelines and tools such as ArgoCD, GitHub Actions, or similar.
  • Experience with Infrastructure as Code (IaC) tools such as Terraform and Kustomize.
  • Exposure to incident management practices, on-call rotations, and postmortem culture.
  • Visa sponsorship is not offered for this role.

    The base salary range below is for the role in the specified location, based on a Full Time Schedule.

    Total compensation package will include base salary + bonus / commission + equity + benefits (health, dental, life, 401k, and paid time off). Please note that the base salary range is a guideline, and that the actual total compensation offer will be determined based on various factors, including, but not limited to, applicant's qualifications, skills, experiences, and location. The base salary range below is for the role in California, based on a Full Time Schedule. $195,000 — $235,000 USD

    What Celonis Can Offer You :

  • The unique opportunity to work with industry-leading process mining technology
  • Investment in your personal growth and skill development (clear career paths, internal mobility opportunities, L&D platform, mentorships, and more)
  • Great compensation and benefits packages (equity (restricted stock units), life insurance, time off, generous leave for new parents from day one, and more). For intern and working student benefits, click here .
  • Physical and mental well-being support (subsidized gym membership, access to counseling, virtual events on well-being topics, and more)
  • A global and growing team of Celonauts from diverse backgrounds to learn from and work with
  • An open-minded culture with innovative, autonomous teams
  • Business Resource Groups to help you feel connected, valued and seen (Black@Celonis, Women@Celonis, Parents@Celonis, Pride@Celonis, Resilience@Celonis, and more)
  • A clear set of company values that guide everything we do : Live for Customer Value, The Best Team Wins, We Own It, and Earth Is Our Future
  • About Us :

    Celonis helps some of the world’s largest and most esteemed brands make processes work for people, companies and the planet. With over 5,000 enterprise customer deployments across nearly every industry, the Celonis Process Intelligence Platform uses process mining and AI to give you a living digital twin of your business operation. It’s system-agnostic and without bias, and empowers companies to reduce waste, create value and benefit people across the top, bottom, and green lines. Since 2011, the Celonis platform has enabled its customers to identify more than $18 billion in value. Celonis is headquartered in Munich, Germany, and New York City, USA, with more than 20 offices worldwide.

    Get familiar with the Celonis Process Intelligence Platform by watching this video .

    Data Privacy, Equal Opportunity, and Accessibility Information

    Celonis is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment and equal opportunity in all aspects of employment. We will not tolerate any unlawful discrimination or harassment of any kind. We make all employment decisions without regard to race / ethnicity, color, sex, pregnancy, age, sexual orientation, gender identity or expression, transgender status, national origin, citizenship status, religion, physical or mental disability, veteran status, or any other factor protected by applicable anti-discrimination laws. As a US federal contractor, we are committed to the principles of affirmative action in accordance with applicable laws and regulations. Different makes us better .

    Any information you submit to Celonis as part of your application will be processed in accordance with Celonis’ Statements on Data Privacy, Equal Opportunity and Accessibility.

    Please be aware of common job offer scams, impersonators and frauds. Learn more here .

    By submitting this application, you confirm that you agree to the storing and processing of your personal data by Celonis as described in our Privacy Notice for the Application and Hiring Process .

    #J-18808-Ljbffr

    Create a job alert for this search

    Reliability Engineer • Redwood City, CA, United States

    Related jobs
    Staff Site Reliability Engineer Denver, Colorado, United States; San Francisco, California, Uni[...]

    Staff Site Reliability Engineer Denver, Colorado, United States; San Francisco, California, Uni[...]

    Checkr • San Francisco, CA, United States
    Full-time
    Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...Show more
    Last updated: 6 hours ago • Promoted • New!
    Staff Reliability Engineer

    Staff Reliability Engineer

    SPAN • San Francisco, CA, US
    Full-time
    SPAN is enabling electrification for all.We are a mission-driven company designing, building, and deploying products that electrify the built environment, reduce carbon emissions, and slow the effe...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Altana AI • San Francisco, CA, United States
    Full-time
    AI can be a powerful tool for good in the world – at Altana we apply AI to the world’s largest organized body of supply chain data to power a more resilient, more secure, and more sustainable model...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer - Managed AI

    Staff Site Reliability Engineer - Managed AI

    Crusoe • San Francisco, CA, US
    Full-time
    About the Role At Crusoe, our Site Reliability Engineering team ensures the reliability and scalability of Crusoe's AI-optimized cloud platform. We're looking for an SRE with a strong background i...Show more
    Last updated: 8 days ago • Promoted
    Senior Staff Site Reliability Engineer - Platform

    Senior Staff Site Reliability Engineer - Platform

    Icon Ventures • San Francisco, CA, United States
    Full-time
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, includin...Show more
    Last updated: 8 days ago • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Crusoe • San Francisco, CA, United States
    Full-time
    Crusoe is building the World’s Favorite AI-first Cloud infrastructure company.We’re pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to p...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Redwood Materials, Inc. • San Francisco, CA, United States
    Full-time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling—keeping critical minerals in circulation and driving the energy transition.Founded in 2...Show more
    Last updated: 6 days ago • Promoted
    Senior / Staff Site Reliability Engineer (SRE)

    Senior / Staff Site Reliability Engineer (SRE)

    DevOps projects • San Francisco, CA, United States
    Full-time
    Senior / Staff Site Reliability Engineer (SRE).Fluidstack is building GPU supercomputers for top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, Me...Show more
    Last updated: 1 day ago • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Checkr • San Francisco, CA, US
    Full-time
    Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr's innovative technology and robust data platform help customers assess risk and ensure safety and c...Show more
    Last updated: 10 days ago • Promoted
    Senior Staff Site Reliability Engineer - Platform

    Senior Staff Site Reliability Engineer - Platform

    Quizlet • San Francisco, CA, United States
    Full-time
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, includin...Show more
    Last updated: 10 days ago • Promoted
    Staff Site Reliability Engineer, Platform

    Staff Site Reliability Engineer, Platform

    Gemini • San Francisco, CA, United States
    Full-time
    Gemini is a global crypto and Web3 platform founded by Cameron and Tyler Winklevoss in 2014, offering a wide range of simple, reliable, and secure crypto products and services to individuals and in...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer - Platform

    Staff Site Reliability Engineer - Platform

    Quizlet • San Francisco, CA, United States
    Full-time
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, includin...Show more
    Last updated: 7 days ago • Promoted
    Staff / Principal Site Reliability Engineer

    Staff / Principal Site Reliability Engineer

    The Resume Database • Redwood City, CA, United States
    Full-time
    Staff / Principal Site Reliability Engineer.Staff / Principal Site Reliability Engineer.You’ll architect scalable solutions, navigate complex technical challenges independently, and deliver results und...Show more
    Last updated: 11 days ago • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Berkley Hunt • San Francisco, CA, United States
    Full-time
    Founder @ Berkley Hunt | Partnering with VC firms to build high-performing tech teams.Berkley Hunt has partnered with a Series B start up, we are seeking a highly skilled Infrastructure Engineer to...Show more
    Last updated: 11 days ago • Promoted
    Staff Site Reliability Engineer, Platform — Hybrid & Scalable

    Staff Site Reliability Engineer, Platform — Hybrid & Scalable

    Gemini • San Francisco, CA, United States
    Full-time
    A leading crypto and Web3 platform in San Francisco seeks a Staff Site Reliability Engineer to lead engineering teams towards modern DevOps practices. This role involves developing automation tools ...Show more
    Last updated: 2 days ago • Promoted
    Staff Site Reliability Engineer - Platform

    Staff Site Reliability Engineer - Platform

    Icon Ventures • San Francisco, CA, United States
    Full-time
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, includin...Show more
    Last updated: 8 days ago • Promoted
    Staff Site Reliability Engineer, Platform

    Staff Site Reliability Engineer, Platform

    Gemini Trust Company • San Francisco, CA, United States
    Full-time
    Gemini is a global crypto and Web3 platform founded by Cameron and Tyler Winklevoss in 2014, offering a wide range of simple, reliable, and secure crypto products and services to individuals and in...Show more
    Last updated: 30+ days ago • Promoted
    Staff / Principal Site Reliability Engineer

    Staff / Principal Site Reliability Engineer

    Veza • San Francisco, CA, US
    Full-time
    Staff / Principal Site Reliability Engineer We are seeking an exceptional Staff / Principal Site Reliability Engineer to lead critical infrastructure initiatives and drive Innovation across our organiz...Show more
    Last updated: 10 days ago • Promoted