Talent.com
Staff Site Reliability Engineer - Observability
Staff Site Reliability Engineer - ObservabilityHispanic Alliance for Career Enhancement • Boston, MA, United States
Staff Site Reliability Engineer - Observability

Staff Site Reliability Engineer - Observability

Hispanic Alliance for Career Enhancement • Boston, MA, United States
30+ days ago
Job type
  • Full-time
Job description

Overview

At CVS Health, we’re building a world of health around every consumer and surrounding ourselves with dedicated colleagues who are passionate about transforming health care.

As the nation’s leading health solutions company, we reach millions of Americans through our local presence, digital channels and more than 300,000 purpose-driven colleagues – caring for people where, when and how they choose in a way that is uniquely more connected, more convenient and more compassionate. And we do it all with heart, each and every day.

Responsibilities

Position Summary

The PCW (Pharmacy & Consumer Wellness) Edge SRE team is seeking a Staff Site Reliability Engineer (SRE) with a primary focus on observability to join our team. This role will lead the design, implementation, and optimization of observability systems to ensure the reliability, performance, and scalability of our environment with emphasis on edge environments. You will collaborate with cross-functional teams to build robust monitoring, alerting, and telemetry solutions, enabling proactive issue detection and resolution across distributed systems. As a senior member of the SRE team, you will drive best practices, mentor others, and shape the strategic evolution of our observability ecosystem in a complex, edge-centric architecture.

Note : the original text uses emphasis; content retained as-is where applicable.

Observability Strategy & Implementation :

Design and implement comprehensive observability solutions tailored for edge computing environments, including monitoring, logging, tracing, and metrics collection, to provide deep visibility into system performance and health across distributed remote facilities.

Define and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs), and business KPIs to measure and enhance system reliability in edge and centralized infrastructure.

Build and optimize dashboards, visualizations, and alerting systems to enable real-time insights and rapid incident response for edge nodes and remote facilities.

Implement distributed tracing and log aggregation systems to troubleshoot complex issues in edge computing environments.

System Reliability & Performance in Edge Computing :

Collaborate with engineering teams to ensure applications and infrastructure at edge locations are designed with observability in mind, incorporating best practices for instrumentation and monitoring in resource-constrained environments.

Drive proactive identification of issues in edge facilities through advanced observability tools, reducing Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) across distributed systems.

Lead incident postmortems, analyzing root causes specific to edge environments and implementing observability-driven improvements to prevent recurrence.

Tooling & Automation for Edge Environments :

Develop and maintain tools, scripts, and automation to enhance observability pipelines, optimizing for the unique challenges of edge computing, such as bandwidth limitations and intermittent connectivity.

Evaluate and integrate industry-standard observability tools (e.g., Prometheus, Grafana, ELK Stack, OpenTelemetry) and recommend solutions tailored for edge computing use cases.

Optimize observability data storage, retention, and querying to balance performance, cost, and scalability across a large number of remote facilities.

Leadership & Collaboration :

Mentor and guide junior SREs and engineers on observability best practices for edge computing, fostering a culture of reliability and proactive monitoring.

Partner with solution, engineering, and business teams to align observability efforts with business objectives, ensuring seamless operation of edge and centralized systems.

Lead cross-functional initiatives to improve observability, reliability, and operational efficiency across distributed edge infrastructure.

Continuous Improvement :

Stay current with emerging observability trends, tools, and methodologies, particularly those suited for edge computing and distributed systems, and advocate for their adoption.

Contribute to the development of observability standards, runbooks, and documentation tailored for edge environments to ensure consistency and scalability.

Drive cost optimization for observability infrastructure while maintaining high-quality monitoring and alerting capabilities across remote facilities.

Required Qualifications

7+ years of experience in Site Reliability Engineering, Observability Engineering, or a related field.

5+ years of experience with observability tools and platforms such as Prometheus, Grafana, Splunk, ELK, OpenTelemetry, or similar.

3+ years of experience with microservices, containerized environments (e.g., Kubernetes, Docker), and distributed systems, particularly in edge deployments.

Preferred Qualifications

Experience with implementation of AIOps.

Demonstrated ability to handle observability challenges in environments with intermittent connectivity, high latency, or geographically dispersed infrastructure.

Strong proficiency in programming / scripting languages (e.g., Python, Java) for automation and tooling in distributed environments.

Expertise working in edge computing environments with a large number of remote facilities, managing observability for distributed, high-latency, or resource-constrained systems.

Experience with OpenTelemetry or other open-source observability frameworks optimized for edge computing.

Familiarity with chaos engineering principles to validate observability systems in edge environments.

Certifications in cloud platforms (Google Cloud Professional certification) or Kubernetes.

Strong problem-solving skills with a proactive, analytical mindset, particularly for addressing edge computing challenges.

Excellent communication and collaboration skills to work effectively with cross-functional teams across centralized and remote locations.

Ability to mentor and lead technical initiatives with a focus on observability and reliability in edge environments.

Comfortable working in a fast-paced, dynamic environment with a focus on delivering customer value.

Knowledge of incident management processes and tools (e.g., ServiceNow, xMatters, Opsgenie) tailored for distributed systems.

Deep understanding of monitoring, logging, and tracing concepts, including metrics collection, log aggregation, and distributed tracing for edge and centralized systems.

Familiarity with cloud infrastructure, CI / CD pipelines, and edge-specific deployment patterns.

Education

Bachelor’s degree, or equivalent experience (HS diploma + 4 years relevant experience)

Business Overview

Bring your heart to CVS Health Every one of us at CVS Health shares a single, clear purpose : Bringing our heart to every moment of your health. This purpose guides our commitment to deliver enhanced human-centric health care for a rapidly changing world. Anchored in our brand — with heart at its center — our purpose sends a personal message that how we deliver our services is just as important as what we deliver. Our Heart At Work Behaviors support this purpose. We want everyone who works at CVS Health to feel empowered by the role they play in transforming our culture and accelerating our ability to innovate and deliver solutions to make health care more personal, convenient and affordable. We strive to promote and sustain a culture of diversity, inclusion and belonging every day. CVS Health is an affirmative action employer, and is an equal opportunity employer, as are the physician-owned businesses for which CVS Health provides management services. We do not discriminate in recruiting, hiring, promotion, or any other personnel action based on race, ethnicity, color, national origin, sex / gender, sexual orientation, gender identity or expression, religion, age, disability, protected veteran status, or any other characteristic protected by applicable federal, state, or local law. We proudly support and encourage people with military experience (active, veterans, reservists and National Guard) as well as military spouses to apply for CVS Health job opportunities.

Pay Range

The typical pay range for this role is :

$118,450.00 - $284,280.00

This pay range represents the base hourly rate or base annual full-time salary for all positions in the job grade within which this position falls. The actual base salary offer will depend on a variety of factors including experience, education, geography and other relevant factors. This position is eligible for a CVS Health bonus, commission or short-term incentive program in addition to the base pay range listed above. This position also includes an award target in the company’s equity award program.

Benefits & Additional Information

Our people fuel our future. Our teams reflect the customers, patients, members and communities we serve and we are committed to fostering a workplace where every colleague feels valued and that they belong.

Great benefits for great people

We take pride in our comprehensive and competitive mix of pay and benefits – investing in the physical, emotional and financial wellness of our colleagues and their families to help them be the healthiest they can be. In addition to our competitive wages, our great benefits include :

Affordable medical plan options, a 401(k) plan (including matching company contributions), and an employee stock purchase plan .

No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching.

Benefit solutions that address the different needs and preferences of our colleagues including paid time off, flexible work schedules, family leave, dependent care resources, colleague assistance programs, tuition assistance, retiree medical access and many other benefits depending on eligibility.

For more information, please refer to the benefits section of CVS Health careers site.

We anticipate the application window for this opening will close on : 10 / 23 / 2025

Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state and local laws.

We are an equal opportunity and affirmative action employer. We do not discriminate in recruiting, hiring, promotion, or any other personnel action based on race, ethnicity, color, national origin, sex / gender, sexual orientation, gender identity or expression, religion, age, disability, protected veteran status, or any other characteristic protected by applicable federal, state, or local law.

#J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • Boston, MA, United States

Related jobs
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Aras Corporation • Andover, Massachusetts, United States
Full-time
Aras is a leader in product lifecycle management (PLM) and digital thread solutions.As one of the fastest growing PLM companies, our technology enables the rapid delivery of flexible solutions buil...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Red Hat • Boston, MA, United States
Full-time +1
Join to apply for the Site Reliability Engineer role at Red Hat.Red Hat is looking for a Platform Engineer to join its Platform Engineering team! In this role, you will help architect, implement, i...Show more
Last updated: 10 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Cimulate, Inc. • Boston, MA, United States
Full-time
In this pivotal role, you’ll own the reliability, availability, and performance of our SaaS production environment—monitoring critical systems, managing deployments, and ensuring seamless operation...Show more
Last updated: 28 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Wabbisoft • Boston, MA, United States
Full-time
Boston, MA or Remote / / Full-time Position.Are you interested in helping companies transform the way they think about security as part of their software development pipeline? If “Yes!,” then keep re...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Red Hat, Inc. • Boston, MA, United States
Full-time +1
Red Hat is looking for a Platform Engineer to join its Platform Engineering team! In this role, you will help architect, implement, improve, and support the OpenShift-based platform that runs many ...Show more
Last updated: 7 days ago • Promoted
Lead Reliability Engineer

Lead Reliability Engineer

Arcadis • Boston, MA, United States
Full-time +1
Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features.Arcadis is the world's leading company delivering sustainable design, engineering, and consultancy sol...Show more
Last updated: 10 days ago • Promoted
Staff Site Reliability Engineer - Observability

Staff Site Reliability Engineer - Observability

Hispanic Alliance for Career Enhancement • Boston, MA, United States
Full-time
At CVS Health, we’re building a world of health around every consumer and surrounding ourselves with dedicated colleagues who are passionate about transforming health care.As the nation’s leading h...Show more
Last updated: 10 days ago • Promoted
Site Reliability Engineer II

Site Reliability Engineer II

National Society for Black Engineers • Boston, MA, United States
Full-time
Join Axon and be a Force for Good.At Axon, we’re on a mission to Protect Life.We’re explorers, pursuing society’s most critical safety and justice issues with our ecosystem of devices and cloud sof...Show more
Last updated: 10 days ago • Promoted
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Air Space Intelligence • Boston, Massachusetts, United States
Full-time
ASI enables success for the world's most complex operations.From critical infrastructure to defense, we serve major airlines and U. Backed by top-tier investors—including Andreessen Horowitz, Spark ...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

LogRocket, Inc • Boston, MA, United States
Full-time
LogRocket is an equal opportunity employer.We celebrate diversity and are committed to creating an inclusive environment for all employees. LogRocket will consider sponsoring visas for applicants in...Show more
Last updated: 30+ days ago • Promoted
DevOps and Site Reliability Engineer

DevOps and Site Reliability Engineer

Devopshunt • Boston, MA, United States
Full-time
Boston Red Sox and Fenway Sports Management.Members of the Baseball Systems team at the Boston Red Sox are focused on designing, building, and refining the software and data pipelines used within B...Show more
Last updated: 19 hours ago • Promoted • New!
Site Reliability Engineer I

Site Reliability Engineer I

Axon • Boston, Massachusetts, United States
Full-time
Join Axon and be a Force for Good.At Axon, we’re on a mission to Protect Life.We’re explorers, pursuing society’s most critical safety and justice issues with our ecosystem of devices and cloud sof...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

DevOps projects • Boston, MA, United States
Full-time
Cimulate is an AI-native eCommerce search and discovery platform built on cutting‑edge LLM technology.We help commerce brands deliver radically better shopping experiences—faster, more relevant, an...Show more
Last updated: 6 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Cimulate AI • Boston, MA, United States
Full-time
In this pivotal role, you will own the reliability, availability, and performance of our SaaS production environment—monitoring critical systems, managing deployments, and ensuring seamless operati...Show more
Last updated: 10 days ago • Promoted
Site Reliability Engineer III - AWM

Site Reliability Engineer III - AWM

JPMorgan Chase & Co. • Boston, MA, United States
Full-time
We have an exciting and rewarding opportunity for you to take your software engineering career to the next level.As a Software Engineer III at JPMorganChase within the Asset and Wealth Management A...Show more
Last updated: 10 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Canonical • Boston, MA, United States
Full-time
Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiat...Show more
Last updated: 30+ days ago • Promoted
Reliability Engineer (Facilities and Operations) - Evergreen Contract

Reliability Engineer (Facilities and Operations) - Evergreen Contract

Vertex Pharmaceuticals • Boston, MA, United States
Full-time
We are always looking for talented individuals to join our Facilities and Operations team.Specific openings in Reliability Engineering occur frequently. This is an evergreen role, meaning we are alw...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Logrocket • Boston, Massachusetts, United States
Full-time
Founded in 2016, LogRocket's goal is to make every experience on the web as perfect as possible.We're solving a huge challenge for product managers and developers - understanding the user experienc...Show more
Last updated: 30+ days ago • Promoted