Talent.com
Site Reliability Engineer - Observability
Site Reliability Engineer - ObservabilityRivian and Volkswagen Group Technologies • Palo Alto, CA, United States
Site Reliability Engineer - Observability

Site Reliability Engineer - Observability

Rivian and Volkswagen Group Technologies • Palo Alto, CA, United States
4 days ago
Job type
  • Full-time
Job description

Overview

We are seeking a Senior Site Reliability Engineer (SRE) specializing in Observability to join RivianVW's Data Platform - Production Engineering team. In this role, you will design, implement, and scale robust observability systems to ensure the health, performance, and reliability of our production environment. You will collaborate closely with cross-functional teams to create telemetry solutions that provide actionable insights into our distributed systems.

Responsibilities

  • Observability Platform Design : Architect, implement, and maintain observability systems, leveraging tools like Datadog, LGTM stack, OpenTelemetry, and Vector to enable real-time performance monitoring, logging, and alerting.
  • Telemetry Optimization : Evolve and scale telemetry pipelines to ensure low latency and high availability for metrics, logs, and traces across multi-cloud environments.
  • Performance Engineering : Proactively identify performance bottlenecks, optimize systems, and provide recommendations for reliability improvements.
  • Scalable Automation : Implement automation solutions to scale systems sustainably while driving improvements in reliability and deployment velocity.
  • Incident Management : Collaborate with the incident response team to establish data-driven debugging and troubleshooting processes using observability data.
  • Tooling Development : Create and maintain self-service observability tools and dashboards to empower teams across the organization.
  • Cross-functional Collaboration : Partner with development, DevOps, and infrastructure teams to define SLOs / SLIs and ensure observability is embedded throughout the software lifecycle.

Qualifications

  • Educational Background : Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
  • Experience : 5+ years in Site Reliability Engineering or a related role with a strong emphasis on observability.
  • Technical Expertise :

  • Proficiency in designing and operating observability platforms with tools like Prometheus, Grafana, Loki, Jaeger, or Datadog.
  • Experience with OpenTelemetry and distributed tracing in microservices architectures.
  • Deep knowledge of Kubernetes (e.g., EKS), ArgoCD, and Crossplane.
  • Programming Skills : Strong proficiency in Python, Go, or similar languages for building automation and custom telemetry solutions.
  • Cloud & Systems : Familiarity with multi-cloud setups, containerization (Docker), and Linux system fundamentals.
  • Soft Skills : Exceptional problem-solving, communication, and a data-driven approach to decision-making.
  • Pay Disclosure

    Salary Range / Hourly Rate for California Based Applicants : $146,900 - $194,610 USD

    Actual Compensation will be determined based on experience, location, and other factors permitted by law.

    Benefits Summary

    Rivian and Volkswagen Group Technologies provides robust medical, prescription, dental and vision insurance packages for full-time employees, their spouse or domestic partner, and their children up to age 26. Coverage is effective on the first day of employment.

    Equal Opportunity

    Rivian and Volkswagen Group Technologies is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, sex, sexual orientation, gender, gender expression, gender identity, genetic information or characteristics, physical or mental disability, marital / domestic partner status, age, military / veteran status, medical condition, or any other characteristic protected by law. We are also committed to ensuring compliance with all applicable fair employment practice laws regarding citizenship and immigration status.

    Accommodations

    Rivian and Volkswagen Group Technologies is committed to ensuring that our hiring process is accessible for persons with disabilities. If you have a disability or limitation, such as those covered by the Americans with Disabilities Act, that requires accommodations to assist you in the search and application process, please email us at candidateaccommodations@rivian.com.

    Candidate Data Privacy

    Rivian and Volkswagen Group Technologies ("Rivian and Volkswagen Group Technologies") may collect, use and disclose your personal information or personal data (within the meaning of the applicable data protection laws) when you apply for employment and / or participate in our recruitment processes. This data includes contact, demographic, communications, educational, professional, employment, social media / website, network / device, recruiting system usage / interaction, security and preference information. Rivian and Volkswagen Group Technologies may use your Candidate Personal Data for the purposes of (i) tracking interactions with our recruiting system; (ii) carrying out, analyzing and improving our application and recruitment process, including assessing you and your application and conducting employment, background and reference checks; (iii) establishing an employment relationship or entering into an employment contract with you; (iv) complying with our legal, regulatory and corporate governance obligations; (v) recordkeeping; (vi) ensuring network and information security and preventing fraud; and (vii) as otherwise required or permitted by applicable law. Rivian and Volkswagen Group Technologies may share your Candidate Personal Data with internal personnel, Rivian and Volkswagen Group Technologies affiliates, and service providers including background checks, staffing services, and cloud services. They may transfer or store internationally your Candidate Personal Data, including to or in the United States, Canada, and the European Union, and this data may be subject to the laws and accessible to authorities of such jurisdictions. Please see our Candidate Data Privacy Notice (English) and Candidate Data Privacy Notice (Serbian) for more information.

    Please note that we are currently not accepting applications from third party application services.

    Seniority level

  • Not Applicable
  • Employment type

  • Full-time
  • Job function

  • Engineering and Information Technology
  • Industries

  • Software Development
  • #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • Palo Alto, CA, United States

    Related jobs
    Site Reliability Engineer

    Site Reliability Engineer

    ConductorOne • San Francisco, CA, United States
    Full-time
    ConductorOne is the first AI-native identity security platform that protects every identity : human, non-human, and AI.With powerful automation, platform-level AI, and out-of-the-box connectors, it ...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Fortinet • Sunnyvale, CA, United States
    Full-time
    At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, and obsess...Show more
    Last updated: 30+ days ago • Promoted
    Principal Site Reliability Engineer

    Principal Site Reliability Engineer

    Fortinet • Santa Clara, CA, United States
    Full-time
    At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, and obsess...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    Prosper • San Francisco, CA, United States
    Full-time
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...Show more
    Last updated: 23 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Rivago Infotech Inc • San Francisco, CA, United States
    Full-time
    Staff Site Reliability Engineer (SRE).As our Staff SRE, you'll be the primary expert responsible for our entire compute ecosystem. Your key responsibilities will include : .Design, implement, and lead...Show more
    Last updated: 4 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantum • Palo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood Materials, Inc. • San Francisco, CA, United States
    Full-time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling—keeping critical minerals in circulation and driving the energy transition.Founded in 2...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Archetype AI • Palo Alto, CA, United States
    Full-time
    Get AI-powered advice on this job and more exclusive features.Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team f...Show more
    Last updated: 1 day ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Corelight • San Francisco, CA, United States
    Full-time
    Senior Site Reliability Engineer.We are looking for a Senior Site Reliability Engineer to design, automate, and scale cloud and hybrid platforms that power AI / ML workloads and SaaS services.You\'ll...Show more
    Last updated: 4 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Runloop AI • San Francisco, CA, United States
    Full-time
    Runloop is building the foundational infrastructure for the next generation of AI development.We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxe...Show more
    Last updated: 27 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Alchemy • San Francisco, CA, United States
    Full-time
    Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers th...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Together AI • San Francisco, CA, United States
    Full-time
    As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a soft...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Alembic Technologies • San Francisco, CA, United States
    Full-time
    Senior Site Reliability Engineer.This range is provided by Alembic Technologies.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.We’re looking fo...Show more
    Last updated: 13 hours ago • Promoted • New!
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Alembic • San Francisco, CA, United States
    Full-time
    We’re looking for an experienced.Site Reliability Engineer (SRE).You’ll partner with engineers and data scientists to build, automate, and maintain the infrastructure that powers our core platform—...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Sigmaways Inc • San Francisco, California, United States
    Full-time
    As a Site reliability engineer, you will partner with development and IT teams to implement CI / CD pipelines, develop automation and monitoring solutions to ensure our platforms are secure, scalable...Show more
    Last updated: 1 day ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Primer • San Francisco, CA, United States
    Full-time
    Primer helps B2B products break out of the B2C-centric marketing box.Our platform turns consumer ad channels, data streams, and emerging AI workflows into measurable growth engines for go-to-market...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Hive • San Francisco, CA, United States
    Full-time
    Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Speak • San Francisco, CA, United States
    Full-time
    Our mission is to reinvent the way people learn, starting with language.Learning a language can change a life by opening doors to new cultures, careers, and communities. Two billion people around th...Show more
    Last updated: 1 day ago • Promoted