Site Reliability Engineer

ClientMind Recruiting Inc.Bethesda, MD, United States

6 hours ago

Job type

Full-time

Job description

Clientmind Recruiting is searching for a Site Reliability Engineer for a growing tech company based in the Bethesda, MD area. This will be onsite 1x per week (Tuesday).

This role centers on maintaining the “common” IaC constructs (Python-based abstractions in AWS CDK and CDK8s) that define their platform. These include networking, EKS configuration, data stores, observability, autoscaling patterns, and deployment primitives. You’ll work closely with backend engineers to make infrastructure safe, consistent, and easy to adopt.

Responsibilities

Design, implement, and evolve shared CDK and CDK8s constructs used by multiple services and teams.
Maintain base infrastructure components : VPC, EKS, node groups, RDS, OpenSearch, and MSK.
Operate and extend Kubernetes cluster addons : ingress controllers, cert‑manager, autoscaler, monitoring / logging stacks.
Ensure high reliability through well‑structured alerting (Prometheus, CloudWatch), autoscaling, and recovery patterns.
Manage and publish baseline templates, configuration schemas, and documentation for infrastructure usage.
Own the CI / CD processes for IaC codebases and platform component releases.
Collaborate with engineering teams to diagnose infrastructure issues and propose robust solutions.
Apply SRE principles—SLIs / SLOs, observability, fault‑tolerance to all shared platform services.
Support IAM roles, secrets management, and tenant isolation patterns.

Required Experience

5+ years of infrastructure or SRE experience, including AWS (VPC, IAM, RDS, MSK, S3) and Kubernetes (Helm, RBAC, ServiceAccounts).

Fluency in Python and experience with Infrastructure-as-Code using AWS CDK, CDK8s, or equivalent frameworks.

Strong understanding of Prometheus, Grafana, and alert routing practices.

Experience designing reusable infrastructure patterns or internal developer platforms.

Proven ability to improve reliability through automation, monitoring, and operational best practices.

Nice to Have

Experience supporting Spark on Kubernetes, Argo, or Kafka‑based batch pipelines.

Awareness of cost‑efficiency strategies across EC2, storage, and autoscaling.

#J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • Bethesda, MD, United States

Related jobs

Promoted

Staff Site Reliability Engineer

VisaAshburn, VA, United States

Full-time

Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show moreLast updated: 30+ days ago

Promoted
New!

Site Reliability Engineer

Anduril IndustriesWashington, DC, United States

Full-time

Anduril Industries is a defense technology company with a mission to transform U.By bringing the expertise, technology, and business model of the 21st century’s most innovative companies to the def...Show moreLast updated: 18 hours ago

Promoted

Lead Site Reliability Engineer

Federated ITWashington, DC, United States

Full-time

Bridge Defense is redefining how modern defense technology is delivered.Department of Defense, the Intelligence Community, and federal law enforcement agencies. We provide full-spectrum national sec...Show moreLast updated: 7 days ago

Promoted

Site Reliability Engineer

Leidos IncReston, VA, United States

Full-time

The Multi Domain Solutions Division at Leidos is looking for a.This role involves supporting the delivery of comprehensive IT and support services to ensure mission success while adhering to DoD st...Show moreLast updated: 19 days ago

Promoted

Sr. Manager - Site Reliability Engineer

VisaAshburn, VA, United States

Full-time

Promoted

Site Reliability Engineer III

VerisignReston, VA, United States

Full-time

Verisign helps enable the security, stability, and resiliency of the internet.We are a trusted provider of internet infrastructure services for the networked world and deliver unmatched performance...Show moreLast updated: 30+ days ago

Site Reliability Engineer

Tax AnalystsFalls Church, VA, US

Full-time

Quick Apply

Tax Analysts is seeking a Site Reliability Engineer (SRE) to help establish and shape our reliability engineering practice from the ground up. This is a unique opportunity to join a mission-driven o...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer (SRE) – TS / SCI Clearance

Tech CraticWashington, DC, United States

Full-time

Site Reliability Engineer (SRE) – TS / SCI Clearance.Technology has revolutionized how we approach job hunting, and this book streamlines the process into a fast, efficient system that works.Instead ...Show moreLast updated: 30+ days ago

Promoted

Senior Reliability Engineer

The Johns Hopkins University Applied Physics LaboratoryLaurel, MD, United States

Full-time

Are you passionate about applying reliability and system engineering principles to analyze and assess the resilience of future strategic weapon systems?. Do you have a strong technical background in...Show moreLast updated: 10 days ago

Promoted

Site Reliability Engineer

Powder River IndustriesWashington, DC, United States

Full-time

Conduct analysis of alternatives for configuration tools, make recommendations, work with team to design, develop, test, implement, and maintain tool choice. Responsible for the administration, moni...Show moreLast updated: 7 days ago

Promoted

Site Reliability Engineer

CSCI ConsultingQuantico, VA, United States

Full-time

CSCI Consulting is looking for a.Site Reliability Engineer (SRE).This role combines deep systems engineering knowledge with DevOps automation, proactive monitoring, and incident response practices....Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

EngFlowWashington, DC, United States

Full-time

Join to apply for the Site Reliability Engineer role at EngFlow.At EngFlow, we help developers save time by accelerating software builds and tests. Our cloud-based, distributed service optimizes dev...Show moreLast updated: 7 days ago

Promoted

Lead Site Reliability Engineer

Bridge DefenseWashington, DC, United States

Full-time

Promoted

Principal Site Reliability Engineer (SRE) at Jobgether Washington DC

JobgetherWashington, DC, United States

Full-time

Principal Site Reliability Engineer (SRE) job at Jobgether.This position is posted by Jobgether on behalf of.We are currently looking for a. Principal Site Reliability Engineer (SRE).Join a high-imp...Show moreLast updated: 30+ days ago

Promoted

Deployment Site Reliability Engineer - Connected Warfare

Anduril Industries, Inc.Washington, DC, United States

Full-time

Senior Deployed Site Reliability Engineer, Connected Warfare.Washington, District of Columbia, United States.Anduril Industries is a defense technology company with a mission to transform U.By brin...Show moreLast updated: 7 days ago

Promoted

Site Reliability Engineer

CapeWashington, DC, United States

Full-time

Cape was founded in early 2022 by Palantir and Anduril alums with deep expertise in privacy and national security.While running Palantir’s US national security business, our CEO became passionate a...Show moreLast updated: 7 days ago

Promoted

Site Reliability Engineer — Scale mission-critical platforms

Anduril IndustriesWashington, DC, United States

Full-time

A defense technology company is seeking a Site Reliability Engineer in Washington, DC.The role involves solving challenges in networking and systems integration while working with cross-functional ...Show moreLast updated: 3 days ago

Promoted

Staff Site Reliability Engineer (Federal)

OktaWashington, DC, United States

Full-time

Okta is The World's Identity Company.We free everyone to safely use any technology, anywhere, on any device or app.Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secur...Show moreLast updated: 30+ days ago