Talent.com
Senior Site Reliability Engineer

Senior Site Reliability Engineer

TAG - The Aspen GroupChicago, IL, United States
2 days ago
Job type
  • Full-time
Job description

The Aspen Group (TAG) is one of the largest and most trusted retail healthcare business support organizations in the U.S. and has supported over 20,000 healthcare professionals and team members with close to 1,500 health and wellness offices across 48 states in four distinct categories : dental care, urgent care, medical aesthetics, and animal health. Working in partnership with independent practice owners and clinicians, the team is united by a single purpose : to prove that healthcare can be better and smarter for everyone. TAG provides a comprehensive suite of centralized business support services that power the impact of five consumer-facing businesses : Aspen Dental, ClearChoice Dental Implant Centers, WellNow Urgent Care, Chapter Aesthetic Studio, and Lovet Pet Health Care. Each brand has access to a deep community of experts, tools and resources to grow their practices, and an unwavering commitment to delivering high-quality consumer healthcare experiences at scale.

As a Senior Site Reliability Engineer (SRE) at TAG – The Aspen Group , you will be responsible for ensuring the reliability, performance, and scalability of our core systems. This role involves proactively building and managing, monitoring solutions, lead incident response, and continuously optimizing system performance to exceed business objectives. We are actively integrating AI and machine learning into our operational workflows, and you will be on the front lines, leveraging intelligent automation and machine learning to build a proactive resilient infrastructure. This is an opportunity to go beyond SRE by applying cutting-edge technology to solve complex reliability challenges.

Responsibilities :

Intelligent Site Reliability Engineering :

  • Design and build highly scalable and resilient systems to support our applications and services, incorporating predictive analytics to anticipate reliability risks.
  • Develop and manage Service Level Objectives (SLOs) and Service Level Indicators (SLIs) using machine learning anomaly detection to ensure systems meet reliability targets.
  • Drive improvements in system reliability, availability, and performance through proactive measures, automation, and intelligent failure prediction.

Advanced Observability :

  • Implement and manage comprehensive monitoring and alerting solutions, integrating with intelligent observability platforms that reduce alert noise and correlate events.
  • Develop and maintain dashboards and reporting tools that provide data-driven insights for actionable troubleshooting recommendations and performance optimization.
  • Evaluate and integrate advanced monitoring tools and operational intelligence platforms to enhance observability and root cause identification.
  • Proactive Incident Management :

  • Lead and participate in incident response efforts, using intelligent log analysis and automated event correlation to speed up troubleshooting and root cause identification.
  • Develop and maintain incident management processes incorporating automated decision support systems to improve response times and minimize service disruptions.
  • Conduct post-incident reviews, using automated pattern recognition and trend analysis to identify systemic issues and implement preventive measures.
  • Performance and Capacity Optimization :

  • Analyze performance metrics and logs, supported by advanced observability tools, to detect bottlenecks and inefficiencies.
  • Collaborate with development teams to implement automated profiling and optimization recommendations for code and infrastructure improvements.
  • Perform capacity planning using machine learning forecasting models to ensure systems can handle current and future loads.
  • Automation and Process Improvement :

  • Develop and implement automation solutions, including intelligent runbook automation, self-healing systems, and automated incident triage.
  • Identify and drive process improvements by applying machine learning to operational data for continuous optimization.
  • Maintain documentation that includes automation and machine learning guidelines for monitoring, incident management, and SRE best practices.
  • Collaboration and Communication :

  • Work closely with engineering, operations, and product teams to align reliability and monitoring goals, including automation adoption strategies.
  • Communicate effectively with stakeholders, providing regular updates on system health, incidents, performance improvements, and data-driven insights.
  • Foster a culture of collaboration, knowledge sharing, and automation best practices within the team and across the organization.
  • Requirements :

  • Bachelor's degree in computer science or a related technical field.
  • At least 5 years of experience in Site Reliability Engineering or a similar role.
  • Strong proficiency in at least one programming language such as Python, Go, or C#
  • Demonstrated experience applying machine learning and automation to operational workflows such as monitoring, alerting and incident response.
  • Expertise with infrastructure as code tools such as Terraform
  • Proven experience working and monitoring container environments such as Cloud Run and Kubernetes.
  • Hands-on experience using and working within an Azure, AWS, and GCP environment (GCP preferred)
  • Strong understanding of networking, distributed systems, and cloud infrastructure.
  • Familiarity with intelligent monitoring platforms and operational analytics tools such as Prometheus, Grafana, OpenSearch, Sentry, Google Cloud Observability
  • Excellent problem-solving skills and the ability to work independently and as part of a team.
  • Experience with incident management, root cause analysis, and automated operational workflows.
  • Annual pay range : $129,000-$160,000

    A generous benefits package that includes paid time off, health, dental, vision, and 401(k) savings plan with match

    Create a job alert for this search

    Senior Site Reliability Engineer • Chicago, IL, United States

    Related jobs
    • Promoted
    Site Reliability Engineer (Python)

    Site Reliability Engineer (Python)

    Fintal PartnersChicago, IL, US
    Full-time
    This is a global trading and research-driven organization built on advanced technology and collaboration.Since its founding, it has played a critical role in ensuring liquidity across financial mar...Show moreLast updated: 1 day ago
    Site Reliability Engineer

    Site Reliability Engineer

    iManageChicago, IL, US
    Full-time
    Quick Apply
    We offer a flexible working policy that supports a healthy balance between personal and professional well-being.This role requires in-office presence on Tuesdays & Thursdays to collaborate, con...Show moreLast updated: 30+ days ago
    Sr. Site Engineer

    Sr. Site Engineer

    OFIBolingbrook, IL, US
    Full-time
    We are a global leader in food & beverage ingredients.Pioneers at heart, we operate at the forefront of consumer trends to provide food & beverage manufacturers with products and ingredients that w...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Project Engineer I

    Senior Project Engineer I

    J.A. Watts, IncorporatedLisle, IL, United States
    Full-time
    JWI) is a woman-owned professional services firm that believes in treating its clients and employees like family at every level of our company. Our family culture is what were known for, and we work...Show moreLast updated: 30+ days ago
    • Promoted
    Sr Lead Software Engineer, Back End / SRE - Shopping (Remote-Eligible)

    Sr Lead Software Engineer, Back End / SRE - Shopping (Remote-Eligible)

    Capital OneChicago, IL, US
    Remote
    Full-time +1
    Sr Lead Software Engineer, Back End / SRE - Shopping (Remote-Eligible).Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, col...Show moreLast updated: 1 day ago
    • Promoted
    Business Systems Team Lead - International

    Business Systems Team Lead - International

    Uline, Inc.Highwood, IL, US
    Full-time
    Business Systems Team Lead - International.Uline Drive, Pleasant Prairie, WI 53158.Support the tech that takes Uline global! As Business Systems Team Lead, you’ll guide a talented team and pl...Show moreLast updated: 1 day ago
    • Promoted
    Senior Project Engineer

    Senior Project Engineer

    MoshunOak Brook, IL, US
    Full-time
    Moshun is comprised of experienced inventors, entrepreneurs, IP experts, financial and industry professionals with the singular goal of bringing “Worthy Tech” to commercialization.We de...Show moreLast updated: 1 day ago
    Site Reliability Engineer

    Site Reliability Engineer

    Frontline EducationNaperville, IL, US
    Full-time
    Hybrid to Wayne, PA / Hybrid to Naperville, IL / Remote.Frontline Education is seeking a Site Reliability Engineer (SRE) to join our team. In this role, you’ll help build and operate the infrastruct...Show moreLast updated: 8 days ago
    • Promoted
    Senior Mechanical Design Engineer – Nuclear Projects

    Senior Mechanical Design Engineer – Nuclear Projects

    System OneNaperville, IL, US
    Permanent
    System One is seeking a skilled and motivated Senior Mechanical Design Engineer for a direct hire opportunity.This role is open to a hybrid work model for candidates local to the Naperville, IL or ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Engineer, Reliability

    Senior Engineer, Reliability

    Shure IncorporatedNiles, IL, United States
    Full-time +1
    Senior Reliability and Product Quality Testing Engineer.You'll play a key role in developing and maintaining reliability tests and procedures, optimizing test equipment to align with real-world usa...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Technical Services Staff

    Senior Technical Services Staff

    INTERNATIONAL CODE COUNCIL INCCountry Club Hills, IL, United States
    Full-time
    The position of Senior Technical Services Staff includes responsibility to perform a variety of professional duties of a technical and often complex nature, relating to ICCs codes and standards dev...Show moreLast updated: 23 days ago
    • Promoted
    Senior Software Engineer, Site Reliability

    Senior Software Engineer, Site Reliability

    Capital OneDeerfield, IL, US
    Full-time +1
    Senior Software Engineer, Site Reliability.Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive.At C...Show moreLast updated: 30+ days ago
    • Promoted
    Reliability Engineer

    Reliability Engineer

    S&C ElectricChicago, IL, United States
    Full-time
    As an S&C Electric team member, you'll work on projects that have real-world impact.You'll help transform the grid for resilient and reliable power worldwide. S&C has more than a 100-year history of...Show moreLast updated: 3 days ago
    • Promoted
    Senior Design Engineer

    Senior Design Engineer

    OPTO International, Inc.Wood Dale, IL, US
    Full-time
    OPTO designs and manufactures retail fixtures for domestic and international markets.We are a design-led company driven by innovation, a passion for helping our clients succeed, and communication.O...Show moreLast updated: 1 day ago
    • Promoted
    Senior Project Engineer

    Senior Project Engineer

    Harris & Co Executive SearchMundelein, IL, US
    Full-time
    Senior Project Engineer – Bay Area.We are exclusively partnered with a leading 100% employee-owned general contractor with a multibillion-dollar annual turnover and a strong reputation for de...Show moreLast updated: 1 day ago
    • Promoted
    Senior Controls Engineer

    Senior Controls Engineer

    Sterling Engineering60089, IL, US
    Full-time
    Senior Controls Engineer – Medical Devices.Location : Buffalo Grove, IL, USA.Up to $165K (Relocation available).Benefits : Medical, Dental, Vision, PTO, 401K. We are seeking a Senior Controls En...Show moreLast updated: 1 day ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    TAG - The Aspen GroupChicago, IL, US
    Full-time
    The Aspen Group (TAG) is one of the largest and most trusted retail healthcare business support organizations in the U.Working in partnership with independent practice owners and clinicians, the te...Show moreLast updated: 1 day ago
    • Promoted
    Lead Software Engineer, Site Reliability

    Lead Software Engineer, Site Reliability

    Capital OneDeerfield, IL, US
    Full-time +1
    Lead Software Engineer, Site Reliability.Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive.At Cap...Show moreLast updated: 25 days ago
    • Promoted
    Senior Facilities & HVAC Project Engineer

    Senior Facilities & HVAC Project Engineer

    Fresenius Kabi USA, LLCMelrose Park, IL, United States
    Full-time
    The Sr Project Engineer will be responsible for providing the managerial and technical expertise necessary to direct the efforts of a project team. The project team consists of contractors from vari...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Earthquake Engineer

    Senior Earthquake Engineer

    Metric GeoMundelein, IL, US
    Full-time
    Metric Geo is currently partnered with an industry leader in the geotechnical engineering space, looking to hire a.Our client consistently rank in the ENR's. Firms list, and are currently in the...Show moreLast updated: 1 day ago