Principal Site Reliability Engineer

DMV IT ServiceWashington, DC, US

12 hours ago

Job type

Full-time

Quick Apply

Job description

Job Title : Principal Site Reliability Engineer

Location : Washington, D.C.

Employment Type : Contract

About US :

DMV IT Service LLC, founded in 2020, is a trusted IT consulting firm specializing in IT infrastructure optimization, cybersecurity, networking, and staffing solutions. We partner with clients to achieve technology goals through expert guidance, workforce support, and innovative solutions. With a client-focused approach, we also provide online training and job placements, ensuring long-term IT success.

Job Purpose :

We are seeking a highly skilled Principal Site Reliability Engineer to lead and elevate the reliability, scalability, and security of critical infrastructure systems. This position requires a seasoned technical professional with deep expertise in infrastructure automation (IaC) , CI / CD architecture , and cloud security , combined with hands-on experience in Site Reliability Engineering (SRE) principles such as SLOs, error budgets, and incident management. The ideal candidate will provide technical leadership, mentor cross-functional teams, and ensure systems are built for performance, resilience, and efficiency.

Requirements

Key Responsibilities :

Reliability & Operations : Establish and manage Service Level Objectives (SLOs) and Service Level Indicators (SLIs) ; oversee incident response , root cause analysis , and continuous service improvement initiatives.
Infrastructure Automation : Architect and manage scalable and secure cloud infrastructures using Infrastructure-as-Code (IaC) tools such as Terraform , Ansible , and CloudFormation .
CI / CD Optimization : Build and optimize secure CI / CD pipelines (e.g., GitHub Actions , Jenkins ) with automated rollbacks, canary and blue-green deployments , and artifact validation processes.
Observability & Monitoring : Develop advanced observability systems by creating dashboards , configuring alerts , and implementing synthetic checks for complete system visibility.
Security Integration : Embed security testing and compliance tools (SAST, DAST, SBOM, secret scanning) into deployment workflows and enforce security policies-as-code .
Cost & Capacity Management : Track and optimize cloud costs , manage capacity planning , and ensure efficient infrastructure utilization and uptime.
Platform Enablement : Develop self-service tools and shared frameworks that enhance developer efficiency and maintain delivery consistency.
Leadership & Mentorship : Act as a technical leader, mentor engineering teams, and champion best practices in reliability, automation, and secure delivery.

Required Skills & Experience :

Bachelor’s degree in Computer Science , Engineering , or related field.

At least 5 years of experience in SRE, DevOps, or Platform Engineering , with leadership in reliability and automation.

Minimum 3 years managing production-grade cloud systems using modern security and observability tools.

Strong expertise in AWS , Azure , or GCP , especially in Compute, Networking, and IAM.

Hands-on proficiency with Terraform , CloudFormation , Kubernetes , and Docker .

Solid background in Linux systems , shell scripting , and programming in Python , Go , or Bash .

Proficient with observability tools such as Prometheus , Grafana , ELK , Datadog , or CloudWatch .

Proven experience designing and managing secure CI / CD pipelines and GitOps workflows .

Deep understanding of SRE practices , including chaos engineering , SLO / SLA management , and capacity modeling .

Strong documentation, communication, and leadership skills with a record of improving operational standards.

Create a job alert for this search

Site Reliability Engineer • Washington, DC, US

Related jobs

Promoted

Reliability Engineer

JobotFrederick, MD, US

Full-time

Manufacturing company hiring Reliability Engineer in Frederick County!.This Jobot Job is hosted by : Christine McNamara.Are you a fit? Easy Apply now by clicking the "Apply Now" buttonand ...Show moreLast updated: 25 days ago

Promoted

Staff Site Reliability Engineer (Federal)

OktaWashington, DC, United States

Full-time

Okta is The World's Identity Company.We free everyone to safely use any technology, anywhere, on any device or app.Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secur...Show moreLast updated: 30+ days ago

Promoted
New!

Sr Site Reliability Engineer - Remote

SitusAMCWashington, DC, United States

Remote

Full-time

SitusAMC is where the best and most passionate people come to transform our client’s businesses and their own careers.Whether you’re a real estate veteran, a passionate technologist, or looking to ...Show moreLast updated: 15 hours ago

Promoted

Site Reliability Engineer (Pipeline)

Technica CorporationWashington, DC, United States

Full-time

At Technica Corporation, our goal is to provide exceptional professional services and innovative technology solutions that meet or exceed our customer’s expectations. We specialize in a wide range o...Show moreLast updated: 1 day ago

New!

Principal Site Reliability Engineer

Black Rock GroupsWashington, DC, United States

Full-time

Quick Apply

The Principal Site Reliability Engineer will be a critical technical leader responsible for driving the operational excellence, resilience, and security of our core systems for a key Randstad clien...Show moreLast updated: 15 hours ago

Promoted

Sr. Manager - Site Reliability Engineer

VisaAshburn, VA, United States

Full-time

Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show moreLast updated: 7 days ago

Promoted

Principal Site Reliability Engineer - Cloud (Remote)

Donnelley Financial, LLCRockville, MD, United States

Remote

Full-time

Join a dynamic team at the pulse of global markets, where we deliver innovative software and service solutions for essential financial reporting and capital markets transactions.At DFIN, we are a v...Show moreLast updated: 7 days ago

Promoted

Site Reliability Engineer - Redmond WA

Redis EnterpriseWashington, DC, United States

Full-time

We built the product that runs the fast apps our world runs on.If you checked the weather, used your credit card, or looked at your flight status online today, you’re welcome.At Redis, you’ll work ...Show moreLast updated: 30+ days ago

Promoted

Reliability Engineer

Lockheed Martin CorporationBethesda, MD, United States

Full-time

Lockheed Martin is a global security and aerospace company that employs some of the greatest minds in the industry.They are passionate about purposeful innovation, dedicated to keeping people safe ...Show moreLast updated: 30+ days ago

Promoted
New!

DevSecOps Site Reliability Engineer - Clearance Required

LMI Consulting, LLCMcLean, VA, United States

Full-time

DevSecOps Site Reliability Engineer - Clearance Required.Job Locations US-VA-Tysons Job ID 2025-13264 # of Openings 1 Category Information Technology Ben...Show moreLast updated: 11 hours ago

New!

Site Reliability Engineer

Black Rock GroupsWashington, DC, United States

Full-time

Quick Apply

Randstad is seeking a skilled and proactive Site Reliability Engineer (SRE) to join our client in the Washington D.The ideal candidate will bridge the gap between development and...Show moreLast updated: 15 hours ago

Promoted

Senior Software Engineer, Site Reliability

Capital OneWashington, DC, United States

Full-time +1

Senior Software Engineer, Site Reliability.Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive, and...Show moreLast updated: 29 days ago

Promoted

Site Reliability Engineer

Karsun SolutionsWashington, DC, United States

Full-time

Summary : As a Site Reliability Engineer, you will help build out and run production environments, automate operations and maintain and support infrastructure. Drive and establish Service level objec...Show moreLast updated: 1 day ago

Promoted
New!

Cloud Site Reliability Engineer

Ford Motor CompanyWashington, DC, United States

Full-time

Enterprise Technology is the engine driving the future of transportation.If you’re looking for the chance to leverage advanced technology to redefine the mobility landscape, enhance the customer ex...Show moreLast updated: 15 hours ago

Promoted

Site Reliability Engineer, Home

Google Inc.Washington, DC, United States

Full-time

Experience completing work as directed, and collaborating with teammates; developing knowledge of relevant concepts and processes. At Google, we have a vision of empowerment and equitable opportunit...Show moreLast updated: 1 day ago

Promoted
New!

Staff Site Reliability Engineer (Federal)

Okta, Inc.Mt Rainier, MD, United States

Full-time

Overview Get to know Okta Okta is The World's Identity Company.We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Aut...Show moreLast updated: 6 hours ago

Promoted

Section Engineer - BGE T&S Strategic Proj Eng

ExelonFinksburg, MD, United States

Full-time

Who We Are : We're powering a cleaner, brighter future.Exelon is leading the energy transformation, and we're calling all problem solvers, innovators, community builders and change makers.Work with ...Show moreLast updated: 19 days ago

Promoted

Site Reliability Engineer III

VerisignReston, Virginia, United States

Full-time

Verisign helps enable the security, stability, and resiliency of the internet.We are a trusted provider of internet infrastructure services for the networked world and deliver unmatched performance...Show moreLast updated: 30+ days ago