Talent.com
Site Reliability Engineer

Site Reliability Engineer

ClientMind Recruiting Inc.Bethesda, MD, United States
6 hours ago
Job type
  • Full-time
Job description

Clientmind Recruiting is searching for a Site Reliability Engineer for a growing tech company based in the Bethesda, MD area. This will be onsite 1x per week (Tuesday).

This role centers on maintaining the “common” IaC constructs (Python-based abstractions in AWS CDK and CDK8s) that define their platform. These include networking, EKS configuration, data stores, observability, autoscaling patterns, and deployment primitives. You’ll work closely with backend engineers to make infrastructure safe, consistent, and easy to adopt.

Responsibilities

  • Design, implement, and evolve shared CDK and CDK8s constructs used by multiple services and teams.
  • Maintain base infrastructure components : VPC, EKS, node groups, RDS, OpenSearch, and MSK.
  • Operate and extend Kubernetes cluster addons : ingress controllers, cert‑manager, autoscaler, monitoring / logging stacks.
  • Ensure high reliability through well‑structured alerting (Prometheus, CloudWatch), autoscaling, and recovery patterns.
  • Manage and publish baseline templates, configuration schemas, and documentation for infrastructure usage.
  • Own the CI / CD processes for IaC codebases and platform component releases.
  • Collaborate with engineering teams to diagnose infrastructure issues and propose robust solutions.
  • Apply SRE principles—SLIs / SLOs, observability, fault‑tolerance to all shared platform services.
  • Support IAM roles, secrets management, and tenant isolation patterns.

Required Experience

  • 5+ years of infrastructure or SRE experience, including AWS (VPC, IAM, RDS, MSK, S3) and Kubernetes (Helm, RBAC, ServiceAccounts).
  • Fluency in Python and experience with Infrastructure-as-Code using AWS CDK, CDK8s, or equivalent frameworks.
  • Strong understanding of Prometheus, Grafana, and alert routing practices.
  • Experience designing reusable infrastructure patterns or internal developer platforms.
  • Proven ability to improve reliability through automation, monitoring, and operational best practices.
  • Nice to Have

  • Experience supporting Spark on Kubernetes, Argo, or Kafka‑based batch pipelines.
  • Awareness of cost‑efficiency strategies across EC2, storage, and autoscaling.
  • #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • Bethesda, MD, United States

    Related jobs
    • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    VisaAshburn, VA, United States
    Full-time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    Anduril IndustriesWashington, DC, United States
    Full-time
    Anduril Industries is a defense technology company with a mission to transform U.By bringing the expertise, technology, and business model of the 21st century’s most innovative companies to the def...Show moreLast updated: 18 hours ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Federated ITWashington, DC, United States
    Full-time
    Bridge Defense is redefining how modern defense technology is delivered.Department of Defense, the Intelligence Community, and federal law enforcement agencies. We provide full-spectrum national sec...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Leidos IncReston, VA, United States
    Full-time
    The Multi Domain Solutions Division at Leidos is looking for a.This role involves supporting the delivery of comprehensive IT and support services to ensure mission success while adhering to DoD st...Show moreLast updated: 19 days ago
    • Promoted
    Sr. Manager - Site Reliability Engineer

    Sr. Manager - Site Reliability Engineer

    VisaAshburn, VA, United States
    Full-time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer III

    Site Reliability Engineer III

    VerisignReston, VA, United States
    Full-time
    Verisign helps enable the security, stability, and resiliency of the internet.We are a trusted provider of internet infrastructure services for the networked world and deliver unmatched performance...Show moreLast updated: 30+ days ago
    Site Reliability Engineer

    Site Reliability Engineer

    Tax AnalystsFalls Church, VA, US
    Full-time
    Quick Apply
    Tax Analysts is seeking a Site Reliability Engineer (SRE) to help establish and shape our reliability engineering practice from the ground up. This is a unique opportunity to join a mission-driven o...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer (SRE) – TS / SCI Clearance

    Site Reliability Engineer (SRE) – TS / SCI Clearance

    Tech CraticWashington, DC, United States
    Full-time
    Site Reliability Engineer (SRE) – TS / SCI Clearance.Technology has revolutionized how we approach job hunting, and this book streamlines the process into a fast, efficient system that works.Instead ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Reliability Engineer

    Senior Reliability Engineer

    The Johns Hopkins University Applied Physics LaboratoryLaurel, MD, United States
    Full-time
    Are you passionate about applying reliability and system engineering principles to analyze and assess the resilience of future strategic weapon systems?. Do you have a strong technical background in...Show moreLast updated: 10 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Powder River IndustriesWashington, DC, United States
    Full-time
    Conduct analysis of alternatives for configuration tools, make recommendations, work with team to design, develop, test, implement, and maintain tool choice. Responsible for the administration, moni...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CSCI ConsultingQuantico, VA, United States
    Full-time
    CSCI Consulting is looking for a.Site Reliability Engineer (SRE).This role combines deep systems engineering knowledge with DevOps automation, proactive monitoring, and incident response practices....Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    EngFlowWashington, DC, United States
    Full-time
    Join to apply for the Site Reliability Engineer role at EngFlow.At EngFlow, we help developers save time by accelerating software builds and tests. Our cloud-based, distributed service optimizes dev...Show moreLast updated: 7 days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Bridge DefenseWashington, DC, United States
    Full-time
    Bridge Defense is redefining how modern defense technology is delivered.Department of Defense, the Intelligence Community, and federal law enforcement agencies. We provide full-spectrum national sec...Show moreLast updated: 7 days ago
    • Promoted
    Principal Site Reliability Engineer (SRE) at Jobgether Washington DC

    Principal Site Reliability Engineer (SRE) at Jobgether Washington DC

    JobgetherWashington, DC, United States
    Full-time
    Principal Site Reliability Engineer (SRE) job at Jobgether.This position is posted by Jobgether on behalf of.We are currently looking for a. Principal Site Reliability Engineer (SRE).Join a high-imp...Show moreLast updated: 30+ days ago
    • Promoted
    Deployment Site Reliability Engineer - Connected Warfare

    Deployment Site Reliability Engineer - Connected Warfare

    Anduril Industries, Inc.Washington, DC, United States
    Full-time
    Senior Deployed Site Reliability Engineer, Connected Warfare.Washington, District of Columbia, United States.Anduril Industries is a defense technology company with a mission to transform U.By brin...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CapeWashington, DC, United States
    Full-time
    Cape was founded in early 2022 by Palantir and Anduril alums with deep expertise in privacy and national security.While running Palantir’s US national security business, our CEO became passionate a...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer — Scale mission-critical platforms

    Site Reliability Engineer — Scale mission-critical platforms

    Anduril IndustriesWashington, DC, United States
    Full-time
    A defense technology company is seeking a Site Reliability Engineer in Washington, DC.The role involves solving challenges in networking and systems integration while working with cross-functional ...Show moreLast updated: 3 days ago
    • Promoted
    Staff Site Reliability Engineer (Federal)

    Staff Site Reliability Engineer (Federal)

    OktaWashington, DC, United States
    Full-time
    Okta is The World's Identity Company.We free everyone to safely use any technology, anywhere, on any device or app.Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secur...Show moreLast updated: 30+ days ago