Talent.com
No longer accepting applications
Staff Site Reliability Engineer - Observability

Staff Site Reliability Engineer - Observability

CVS HealthBoston, MA, United States
1 day ago
Job type
  • Full-time
Job description

Overview

At CVS Health, we’re building a world of health around every consumer and surrounding ourselves with dedicated colleagues who are passionate about transforming health care.

As the nation’s leading health solutions company, we reach millions of Americans through our local presence, digital channels and more than 300,000 purpose-driven colleagues – caring for people where, when and how they choose in a way that is uniquely more connected, more convenient and more compassionate. And we do it all with heart, each and every day.

Responsibilities

Position Summary

The PCW (Pharmacy & Consumer Wellness) Edge SRE team is seeking a Staff Site Reliability Engineer (SRE) with a primary focus on observability to join our team. This role will lead the design, implementation, and optimization of observability systems to ensure the reliability, performance, and scalability of our environment with emphasis on edge environments. You will collaborate with cross-functional teams to build robust monitoring, alerting, and telemetry solutions, enabling proactive issue detection and resolution across distributed systems. As a senior member of the SRE team, you will drive best practices, mentor others, and shape the strategic evolution of our observability ecosystem in a complex, edge-centric architecture.

Note : the original text uses emphasis; content retained as-is where applicable.

Observability Strategy & Implementation :

Design and implement comprehensive observability solutions tailored for edge computing environments, including monitoring, logging, tracing, and metrics collection, to provide deep visibility into system performance and health across distributed remote facilities.

Define and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs), and business KPIs to measure and enhance system reliability in edge and centralized infrastructure.

Build and optimize dashboards, visualizations, and alerting systems to enable real-time insights and rapid incident response for edge nodes and remote facilities.

Implement distributed tracing and log aggregation systems to troubleshoot complex issues in edge computing environments.

System Reliability & Performance in Edge Computing :

Collaborate with engineering teams to ensure applications and infrastructure at edge locations are designed with observability in mind, incorporating best practices for instrumentation and monitoring in resource-constrained environments.

Drive proactive identification of issues in edge facilities through advanced observability tools, reducing Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) across distributed systems.

Lead incident postmortems, analyzing root causes specific to edge environments and implementing observability-driven improvements to prevent recurrence.

Tooling & Automation for Edge Environments :

Develop and maintain tools, scripts, and automation to enhance observability pipelines, optimizing for the unique challenges of edge computing, such as bandwidth limitations and intermittent connectivity.

Evaluate and integrate industry-standard observability tools (e.g., Prometheus, Grafana, ELK Stack, OpenTelemetry) and recommend solutions tailored for edge computing use cases.

Optimize observability data storage, retention, and querying to balance performance, cost, and scalability across a large number of remote facilities.

Leadership & Collaboration :

Mentor and guide junior SREs and engineers on observability best practices for edge computing, fostering a culture of reliability and proactive monitoring.

Partner with solution, engineering, and business teams to align observability efforts with business objectives, ensuring seamless operation of edge and centralized systems.

Lead cross-functional initiatives to improve observability, reliability, and operational efficiency across distributed edge infrastructure.

Continuous Improvement :

Stay current with emerging observability trends, tools, and methodologies, particularly those suited for edge computing and distributed systems, and advocate for their adoption.

Contribute to the development of observability standards, runbooks, and documentation tailored for edge environments to ensure consistency and scalability.

Drive cost optimization for observability infrastructure while maintaining high-quality monitoring and alerting capabilities across remote facilities.

Required Qualifications

7+ years of experience in Site Reliability Engineering, Observability Engineering, or a related field.

5+ years of experience with observability tools and platforms such as Prometheus, Grafana, Splunk, ELK, OpenTelemetry, or similar.

3+ years of experience with microservices, containerized environments (e.g., Kubernetes, Docker), and distributed systems, particularly in edge deployments.

Preferred Qualifications

Experience with implementation of AIOps.

Demonstrated ability to handle observability challenges in environments with intermittent connectivity, high latency, or geographically dispersed infrastructure.

Strong proficiency in programming / scripting languages (e.g., Python, Java) for automation and tooling in distributed environments.

Expertise working in edge computing environments with a large number of remote facilities, managing observability for distributed, high-latency, or resource-constrained systems.

Experience with OpenTelemetry or other open-source observability frameworks optimized for edge computing.

Familiarity with chaos engineering principles to validate observability systems in edge environments.

Certifications in cloud platforms (Google Cloud Professional certification) or Kubernetes.

Strong problem-solving skills with a proactive, analytical mindset, particularly for addressing edge computing challenges.

Excellent communication and collaboration skills to work effectively with cross-functional teams across centralized and remote locations.

Ability to mentor and lead technical initiatives with a focus on observability and reliability in edge environments.

Comfortable working in a fast-paced, dynamic environment with a focus on delivering customer value.

Knowledge of incident management processes and tools (e.g., ServiceNow, xMatters, Opsgenie) tailored for distributed systems.

Deep understanding of monitoring, logging, and tracing concepts, including metrics collection, log aggregation, and distributed tracing for edge and centralized systems.

Familiarity with cloud infrastructure, CI / CD pipelines, and edge-specific deployment patterns.

Education

Bachelor’s degree, or equivalent experience (HS diploma + 4 years relevant experience)

Business Overview

Bring your heart to CVS Health Every one of us at CVS Health shares a single, clear purpose : Bringing our heart to every moment of your health. This purpose guides our commitment to deliver enhanced human-centric health care for a rapidly changing world. Anchored in our brand — with heart at its center — our purpose sends a personal message that how we deliver our services is just as important as what we deliver. Our Heart At Work Behaviors support this purpose. We want everyone who works at CVS Health to feel empowered by the role they play in transforming our culture and accelerating our ability to innovate and deliver solutions to make health care more personal, convenient and affordable. We strive to promote and sustain a culture of diversity, inclusion and belonging every day. CVS Health is an affirmative action employer, and is an equal opportunity employer, as are the physician-owned businesses for which CVS Health provides management services. We do not discriminate in recruiting, hiring, promotion, or any other personnel action based on race, ethnicity, color, national origin, sex / gender, sexual orientation, gender identity or expression, religion, age, disability, protected veteran status, or any other characteristic protected by applicable federal, state, or local law. We proudly support and encourage people with military experience (active, veterans, reservists and National Guard) as well as military spouses to apply for CVS Health job opportunities.

Pay Range

The typical pay range for this role is :

$118,450.00 - $284,280.00

This pay range represents the base hourly rate or base annual full-time salary for all positions in the job grade within which this position falls. The actual base salary offer will depend on a variety of factors including experience, education, geography and other relevant factors. This position is eligible for a CVS Health bonus, commission or short-term incentive program in addition to the base pay range listed above. This position also includes an award target in the company’s equity award program.

Benefits & Additional Information

Our people fuel our future. Our teams reflect the customers, patients, members and communities we serve and we are committed to fostering a workplace where every colleague feels valued and that they belong.

Great benefits for great people

We take pride in our comprehensive and competitive mix of pay and benefits – investing in the physical, emotional and financial wellness of our colleagues and their families to help them be the healthiest they can be. In addition to our competitive wages, our great benefits include :

Affordable medical plan options, a 401(k) plan (including matching company contributions), and an employee stock purchase plan .

No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching.

Benefit solutions that address the different needs and preferences of our colleagues including paid time off, flexible work schedules, family leave, dependent care resources, colleague assistance programs, tuition assistance, retiree medical access and many other benefits depending on eligibility.

For more information, please refer to the benefits section of CVS Health careers site.

We anticipate the application window for this opening will close on : 10 / 23 / 2025

Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state and local laws.

We are an equal opportunity and affirmative action employer. We do not discriminate in recruiting, hiring, promotion, or any other personnel action based on race, ethnicity, color, national origin, sex / gender, sexual orientation, gender identity or expression, religion, age, disability, protected veteran status, or any other characteristic protected by applicable federal, state, or local law.

#J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • Boston, MA, United States

Related jobs
  • Promoted
Staff Augmentation project

Staff Augmentation project

KyybaBoston, MA, US
Full-time
Responsibilities Plans and runs complex projects through complete life cycle.Works directly with business teams (including high-level stakeholders) of an organization to facilitate an agreement on ...Show moreLast updated: 30+ days ago
  • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

CyberarkNewton, MA, US
Full-time
CyberArk (NASDAQ : CYBR), is the global leader in Identity Security.Centered on privileged access management, CyberArk provides the most comprehensive security offering for any identity – huma...Show moreLast updated: 30+ days ago
  • Promoted
Sr. SRE, Compute Infrastructure

Sr. SRE, Compute Infrastructure

NxT LevelBoston, MA, US
Full-time
Senior Site Reliability Engineer – Compute Infrastructure.Location : Boston, MA (Hybrid – Tues–Fri Onsite | Mondays Remote). Compensation : $134,250 – $214,800 + Bonus + Equity...Show moreLast updated: 3 days ago
  • Promoted
Staff Engineer

Staff Engineer

COLLETTE TRAVEL SERVICE INCPawtucket, RI, US
Full-time
Collette is seeking a Staff Engineer to join our growing Technology team.This is a hybrid role based at our headquarters in Pawtucket, RI. There has never been a better time to be in the travel indu...Show moreLast updated: 30+ days ago
  • Promoted
Manager in Training $48,000-$63,000 per year

Manager in Training $48,000-$63,000 per year

Domino's FranchiseGloucester, MA, US
Full-time
You've been working your way up in the restaurant world for awhile.Maybe you even have a little college under your belt on the subject. Whatever the case may be, you know you want to manage a re...Show moreLast updated: 30+ days ago
  • Promoted
Sales Representative (Remote)

Sales Representative (Remote)

American Income LifeScituate, MA, US
Remote
Full-time
A Sales Career That Grows With You.Are you looking for a career path that gives you the freedom and flexibility to control your schedule, but also has the security and stability of a large company?...Show moreLast updated: 30+ days ago
  • Promoted
2nd Shift Process Development Engineer

2nd Shift Process Development Engineer

RaytheonAndover, MA, US
Full-time
MA102 : Andover MA 362 Lowell Plymouth 362 Lowell Street Plymouth, Andover, MA, 01810 USA.Person, or Immigration Status Requirements : . At Raytheon, the foundation of everything we do is rooted in our...Show moreLast updated: 30+ days ago
  • Promoted
Lead Semiconductor Reliability Engineer

Lead Semiconductor Reliability Engineer

RaytheonNorth Reading, MA, United States
Full-time
MA112 : Andover MA 358 Lowell St Dukes 358 Lowell Street Dukes, Andover, MA, 01810 USA.Person, or Immigration Status Requirements : . The ability to obtain and maintain a U.At Raytheon, the foundation ...Show moreLast updated: 30+ days ago
  • Promoted
Load Rating Bridge Engineer

Load Rating Bridge Engineer

Goodwin RecruitingPlymouth, MA, US
Full-time
Job Title : Load Rating Bridge Engineer.The ideal candidate has strong technical skills, experience with AASHTOWare BrR or similar tools, and a solid understanding of AASHTO and FHWA standards.Perfo...Show moreLast updated: 1 day ago
  • Promoted
Enhancements Crew Leader

Enhancements Crew Leader

Mariani LandscapeWalpole, MA, US
Full-time
We are building the premier outdoor living company in the country by creating a "family of family companies" across the nation. Now you can build your landscaping career with the best, working in on...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer Engineer

Site Reliability Engineer Engineer

Coralogix, inc.Boston, MA, United States
Full-time
Site Reliability Engineer EngineerBoston, MA • Full-time • Senior#### About The PositionCoralogix is a modern, full-stack observability platform transforming how businesses process and understand t...Show moreLast updated: 16 days ago
  • Promoted
Site Reliability Engineer (SRE) - Engineering Productivity

Site Reliability Engineer (SRE) - Engineering Productivity

Arista NetworksNashua, NH, US
Full-time
Arista Networks is an industry leader in data-driven, client-to-cloud networking for large data center, campus and routing environments. What sets us apart is our relentless pursuit of innovation.We...Show moreLast updated: 3 days ago
  • Promoted
Senior Reliability Engineer

Senior Reliability Engineer

Rochester Electronics LNewburyport, MA, US
Full-time
Rochester Electronics is immediately hiring for a Senior Reliability Engineer!.At Rochester Electronics, we create an excellent employee experience focused on value, performance, motivation, recogn...Show moreLast updated: 3 days ago
  • Promoted
Technical Leader - Systems Integration Engineering (Remote Eligible, U.S.)

Technical Leader - Systems Integration Engineering (Remote Eligible, U.S.)

GE VernovaCambridge, MA, United States
Remote
Permanent
The BWRX-300 Systems Integration Technical Leader will be responsible for providing leadership, direction, development and resource management of a technical team of engineers supporting GEH Nuclea...Show moreLast updated: 3 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

Coralogix, inc.Boston, MA, United States
Full-time
Site Reliability EngineerBoston, MA • Full-time • Senior#### About The PositionCoralogix is a modern, full-stack observability platform transforming how businesses process and understand their data...Show moreLast updated: 16 days ago
  • Promoted
Maintenance Reliability Engineer

Maintenance Reliability Engineer

CyberCodersGloucester, MA, US
Full-time
Maintenance Reliability Engineer.Maintenance Reliability Engineer - Food Manufacturing .Gloucester, MA - On Site .Year + Benefits & Bonus . This candidate must have 3+ Years of Ex...Show moreLast updated: 30+ days ago
  • Promoted
Senior System Reliability Analysis Engineer

Senior System Reliability Analysis Engineer

Draper LabsCambridge, MA, United States
Full-time
Draper is an independent, nonprofit research and development company headquartered in Cambridge, MA.The 2,000+ employees of Draper tackle important national challenges with a promise of delivering ...Show moreLast updated: 3 days ago
  • Promoted
Licensed Therapist (LCSW, LICSW, LMHC, LMFT) - Greenbush, MA

Licensed Therapist (LCSW, LICSW, LMHC, LMFT) - Greenbush, MA

LifeStance HealthGreenbush, MA, US
Full-time +1
At LifeStance Health, we believe in a truly healthy society where mental and physical healthcare are unified to make lives better. Our mission is to help people lead healthier, more fulfilling lives...Show moreLast updated: 30+ days ago
  • Promoted
Licensed Therapist (LCSW, LICSW, LMHC, LMFT)- Child or Adult - Greenbush, MA

Licensed Therapist (LCSW, LICSW, LMHC, LMFT)- Child or Adult - Greenbush, MA

LifeStance HealthGreenbush, MA, US
Full-time +1
At LifeStance Health, we believe in a truly healthy society where mental and physical healthcare are unified to make lives better. Our mission is to help people lead healthier, more fulfilling lives...Show moreLast updated: 24 days ago
  • Promoted
Utilities / Facilities Site Leader (R&D Site)

Utilities / Facilities Site Leader (R&D Site)

Mentor Technical GroupBoston, MA, US
Full-time
Mentor Technical Group Job Opportunity.Mentor Technical Group (MTG) provides a comprehensive portfolio of technical support and solutions for the FDA-regulated industry. As a world leader in life sc...Show moreLast updated: 30+ days ago