Talent.com
Senior Site Reliability Engineer
Senior Site Reliability EngineerGOVX • FL, US
Senior Site Reliability Engineer

Senior Site Reliability Engineer

GOVX • FL, US
2 days ago
Job type
  • Full-time
  • Remote
  • Quick Apply
Job description

GOVX is seeking an experienced Senior Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our production systems through automation, observability, and operational excellence. This position is remote but must be located in one of the following states : California, Washington, Texas, Tennessee, Florida, Colorado, or New York.

The Senior Site Reliability Engineer (SRE) plays a key role in maintaining resilient infrastructure, monitoring critical services, and improving deployment and recovery processes across environments. The Senior Site Reliability Engineer works under the direction of the Director of Engineering and collaborates closely with Site Reliability Engineers, Automation Engineers, and other members of the engineering organization.

This position will report to the Director of Engineering.

Responsibilities

  • Maintain scalable, secure, and reliable cloud services ensuring reliable system operations within Service Level Objectives.
  • Implement and manage monitoring, alerting, and observability systems using Prometheus, Grafana, and Azure Monitor to proactively identify and resolve issues.
  • Develop and maintain automation scripts and tools in PowerShell, Bash, and C# to improve deployment efficiency, system reliability, and developer productivity.
  • Create, refine, and maintain detailed runbooks for production systems to ensure consistent operational procedures and effective incident response.
  • Define and manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets to measure and maintain system reliability.
  • Collaborate with software engineers and automation engineers to integrate reliability practices into CI / CD pipelines using Azure DevOps.
  • Design and implement intelligent alerting strategies that ensure high signal-to-noise ratios and enable rapid triage of critical issues.
  • Participate in incident response, post-incident reviews, and blameless root cause analysis to drive continuous improvement of system reliability and uptime.
  • Contribute to deployment strategy evolution, including blue-green and canary deployments, to minimize downtime and release risk.
  • Collaborate closely with Automation Engineers to enhance automated validation and testing of production environments.
  • Monitor system health, capacity, and performance, providing data-driven insights and recommendations for optimization.
  • Conduct chaos engineering experiments and resilience testing to proactively identify and address system weaknesses.
  • Develop and maintain disaster recovery and business continuity plans, including regular failover testing.
  • Participate in the on-call rotation for platform services, ensuring high availability and rapid incident resolution.
  • Proactively monitor and respond to production support tickets and alerts within established SLA timeframes, delivering first-level diagnosis, troubleshooting, and escalation as needed to maintain system reliability
  • Continuously improve incident response playbooks and reduce Mean Time to Recovery (MTTR).
  • Participate in sprint planning, stand-ups, and retrospectives to ensure alignment with development and operational objectives.
  • Identify opportunities to improve resiliency, reduce toil, and strengthen the reliability culture across the engineering organization.
  • Collaborate with security and compliance teams to ensure infrastructure meets regulatory and security standards.
  • Support cost optimization efforts by monitoring cloud resource usage and recommending efficiency improvements.
  • Explore and integrate AI / ML-based observability tools for predictive monitoring and anomaly detection.

Requirements

  • 8+ years of professional experience in site reliability, infrastructure, or systems engineering roles.
  • Proficiency with Azure cloud infrastructure, services, and resource management
  • Experience in operating systems, network concepts, protocols, and architecture. Microsoft / Linux operating systems, active directory, OSI.
  • Technical ability in Node JS, .NET / C# and knowledge of both current and legacy architecture, software development practices, and conventions.
  • Strong experience with Rest APIs
  • Hands-on experience with containerization and orchestration using Kubernetes and microservices architecture.
  • Strong automation and scripting skills in PowerShell, Bash.
  • Experience with Infrastructure as Code tools for provisioning and configuration management.
  • Deep understanding of CI / CD processes and tools, preferably using Azure DevOps.
  • Experience implementing and managing observability solutions including Azure Monitor, Application Insights, and Log Analytics Workspaces, Prometheus and Grafana.
  • Strong problem-solving, analytical, and troubleshooting abilities in distributed systems and cloud environments.
  • Ability to write, maintain, and execute operational runbooks and automation for incident management and recovery.
  • Ability to work self-directed, plan and execute projects involving multiple technical resources and stakeholders.
  • Excellent communication and collaboration skills, with the ability to work across software development, infrastructure, and operations teams.
  • Preferred Education and Experience

  • Bachelor’s degree in Computer Science, Engineering, or related technical field.
  • Experience working in Agile / Scrum delivery environments.
  • Experience supporting .NET applications and microservices in a production environment.
  • Experience supporting SQL Server and Cosmos DB applications in production environments.
  • Knowledge of network fundamentals, load balancing, and high-availability architectures.
  • Supervisory Responsibility

    This position does not include supervisory responsibilities but provides mentorship and technical guidance to the Site Reliability team members.

    Travel Requirements

    Yearly travel to the San Diego office headquarters is expected for this position.

    Work Environment

    This job operates in a professional office environment. This role routinely uses standard office equipment such as computers, phones, photocopiers, filing cabinets, and fax machines. This role occasionally must lift and carry office equipment.

    Physical / Mental Demands

  • Physical – This is largely a sedentary role.
  • Mental – Problem-solving, making decisions, interpreting data, organizing, reading / writing.
  • Reasonable accommodation may be made to enable individuals with disabilities to perform the essential functions.
  • Work Location

    Due to state law and tax implications, remote work candidates must live and work in one of the following states : California, Washington, Texas, Tennessee, Florida, Colorado, or New York.

    Benefits

  • Paid Time Off, Paid Sick Leave, Paid Holidays
  • Competitive Medical, Dental, Vision, and Life Insurance
  • 401(k) plan with discretionary match available
  • Flexible Spending Account (FSA), Health Savings Account (HSA)
  • Voluntary benefits including Critical Illness, Group Accident, and Voluntary Life
  • Employee Referral Program
  • Exposure to a growing ecommerce company
  • Discounts on the GOVX website
  • Salary Range

    $165,000 - 175,000 Annually

    AAP / EEO Statement

    EOE. Veterans / Disabled.  Reasonable accommodation may be made to enable individuals with disabilities to perform the essential functions.

    Position will require successful completion of a background check and drug testing prior to starting employment.

    About GOVX, Inc.

    Savings for Those Who Serve

    GOVX was founded in 2011 to offer exclusive benefits to those who serve our country. The GOVX membership is comprised of current and former members of the United States military, law enforcement, firefighting, medical services, and government personnel. We are dedicated to supporting these communities and to offering unique value to our members, while delivering an authentic platform for brands to reach our growing customer base. As the largest and fastest growing digital platform serving this deserving audience, we are committed to stretching the limits of ecommerce to deliver the best assortment for our members’ on-duty and off-duty needs.

    Create a job alert for this search

    Senior Site Reliability Engineer • FL, US

    Related jobs
    Take Surveys Get Paid!

    Take Surveys Get Paid!

    Prime Insights • Poinciana, FL, US
    Full-time
    Join thousands of members already earning with top-paying surveys and offers.Get started today and enjoy competitive rewards, fast payouts with no waiting periods, and the flexibility to participat...Show more
    Last updated: 2 days ago • Promoted
    Site Supervisor - Unarmed

    Site Supervisor - Unarmed

    Allied Universal • Bartow, FL, US
    Full-time
    Allied Universal®, North America's leading security and facility services company, offers rewarding careers that provide you a sense of purpose. While working in a dynamic, welcoming, and co...Show more
    Last updated: 30+ days ago • Promoted
    STAFF SOFTWARE ENGINEER - HYBRID - TAMARAC or PLANT CITY, FL

    STAFF SOFTWARE ENGINEER - HYBRID - TAMARAC or PLANT CITY, FL

    City Furniture, Inc • Plant City, FL, US
    Full-time
    The Staff Software Engineer is software architect who is versatile in their technical expertise and business knowledge, having led multiple products to success, within their business value capabili...Show more
    Last updated: 30+ days ago • Promoted
    Senior Operations Manager (Relocation Required)

    Senior Operations Manager (Relocation Required)

    Carvana • Haines City, FL, US
    Full-time
    This position requires relocation after training •.The role oversees multiple reconditioning lines and is responsible to ensure necessary production metrics are consistently met.Oversee multiple rec...Show more
    Last updated: 30+ days ago • Promoted
    Entry-Level Sales Agent

    Entry-Level Sales Agent

    True North Group LLC. • Lake Wales, FL, US
    Full-time
    Join Our Team at True North Group!.Position : Entry-Level Sales Agent.Number of Openings : 1 Territory Role Available.Start Your Career with True North Group. If you're ready to stop clocking in a...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software Engineer, Core Experiences - Lakeland, USA

    Senior Software Engineer, Core Experiences - Lakeland, USA

    Speechify • Lakeland, FL, US
    Full-time
    Speechify is the easiest way to listen to the world's information.Articles on the web, documents in the cloud, books on your phone. We absorb it all and let you listen to it at your desk, on the...Show more
    Last updated: 21 days ago • Promoted
    Reconditioning Manager - Haines City

    Reconditioning Manager - Haines City

    Carvana • Haines City, FL, US
    Full-time
    At Carvana, we sell cars but we aren't car salesmen.Our promise is simple : we won't sell a car to a customer that we wouldn't sell to our own Mom. To make sure our car's are in.Recon...Show more
    Last updated: 30+ days ago • Promoted
    Survey Taker : Earn up to $25 per survey (Remote)

    Survey Taker : Earn up to $25 per survey (Remote)

    Earn Haus • Sebring, FL, US
    Remote
    Full-time +1
    Looking for people to participate in taking online surveys for Fortune 500 brands.All you need to do is complete online surveys by sharing your opinion. You will help influence brand decisions on se...Show more
    Last updated: 2 days ago • Promoted
    County Meter Reader

    County Meter Reader

    Meter Reader • Bradley, FL
    Full-time
    Responsibilities The primary responsibility of this position is to read meters and record consumption of the water used, cleaning of meter boxes, and removal of vegetation impeding access to meters...Show more
    Last updated: 21 days ago • Promoted
    DEVELOPMENT REVIEW SPECIALIST - ENGINEERING - CITY OF PLANT CITY

    DEVELOPMENT REVIEW SPECIALIST - ENGINEERING - CITY OF PLANT CITY

    Ad-vance Talent Solutions • Plant City, FL, US
    Full-time
    Education and Experience Required : .An Associate's Degree in Engineering, Construction Technology.A Bachelor’s Degree in Engineering, Construction Technology. Knowledge of Computer Aided De...Show more
    Last updated: 6 days ago • Promoted
    Senior Director, Data Center Engineering (Remote - USA)

    Senior Director, Data Center Engineering (Remote - USA)

    Cologix • Lakeland, FL, US
    Remote
    Full-time
    Based in Denver, Colorado, Cologix is North America's leading network-neutral interconnection and hyperscale edge data center company. Our platform gives customers access to 45+ digital edge and...Show more
    Last updated: 14 days ago • Promoted
    Real Estate Sales Agent

    Real Estate Sales Agent

    Royal Realty Real Estate • Poinciana, FL, US
    Full-time
    Are you ready to elevate your real estate career with an exceptional opportunity? Royal Realty Real Estate invites you to join a world of limitless possibilities!. At Royal Realty Real Estate, we...Show more
    Last updated: 30+ days ago • Promoted
    Groundperson with CDL

    Groundperson with CDL

    The Davey Tree Expert Company • Lake Placid, FL, US
    Full-time +1
    The Davey Tree Surgery Company, is pleased to offer a key opportunity in the field of Utility Line Clearance as a Groundperson with a CDL. Provides support for the other crew members by controlling ...Show more
    Last updated: 1 day ago • Promoted
    Real Estate Agent - Zillow Premier Partner

    Real Estate Agent - Zillow Premier Partner

    The Nickley Group • Lithia, FL, US
    Full-time
    Ready to level up your real estate career? The search for your ideal brokerage ends with The Nickley Group!.We're more than a traditional real estate firm – we're your catalyst for gr...Show more
    Last updated: 30+ days ago • Promoted
    Fully Remote Survey Participant

    Fully Remote Survey Participant

    Prime Insights • Poinciana, FL, US
    Remote
    Full-time
    Join thousands of members already earning with top-paying surveys and offers.Get started today and enjoy competitive rewards, fast payouts with no waiting periods, and the flexibility to participat...Show more
    Last updated: 2 days ago • Promoted
    Regional Manager FL - Solar & Energy Storage Sites

    Regional Manager FL - Solar & Energy Storage Sites

    Florida Staffing • Arcadia, FL, US
    Full-time
    Regional Manager FL - Solar & Energy Storage Sites.Location(s) : Arcadia, FL, US, 34266 Company : NextEra Energy.Florida Power & Light Company is the largest electric utility in the U.With one of the...Show more
    Last updated: 2 days ago • Promoted
    Team Lead, Market Operations

    Team Lead, Market Operations

    Carvana • Haines City, FL, US
    Full-time
    At Carvana, we sell cars, but we are not salespeople.We have made it our mission to create a hassle-free way for people to buy and sell cars. We saw a huge problem with how much of a headache it is ...Show more
    Last updated: 30+ days ago • Promoted
    X-Ray Technologist Advanced

    X-Ray Technologist Advanced

    SimonMed Imaging • Poinciana, FL, US
    Full-time
    Join the fastest growing outpatient radiology practice in the Nation- SimonMed Imaging! Our commitment to excellence and improving patient care paired with the best-in-class technology allows us to...Show more
    Last updated: 14 days ago • Promoted