Job Description
Job Description
SRE / Platform Engineer
Full Time
Hybrid - 2 days onsite / 3 WFH (Reston VA)
Role Summary
Hands-on SRE / Platform Engineer that bridges on-prem OpenShift platform with new Azure analytical environment. This role focuses on cluster reliability, automation, and optimization, while partnering with development, infrastructure operations, security, and data teams to enable delivery through modern DevSecOps practices. Supports platform infrastructure that drives trillions of dollars in economic activity every year. Enjoys working with a variety of technologies, is self-starting, and leads with an eye toward continuous improvement.
Key Focus Areas
- Operate, tune, and optimize OpenShift / Kubernetes clusters
- Operate and manage analytics-focused Azure services (compute, networking, storage, data)
- Bridge on-prem to cloud : migrate, integrate, and support hybrid service model
- Support the needs of platform stakeholders with platform tools, products and services
Technologies
Automation & IaC : Terraform, Ansible, GitOpsObservability : Datadog, Prometheus, GrafanaNetworking & ingress : Nginx, service meshes, container networkingMessaging platforms : Kafka, AMQSecrets & access management : HashiCorp VaultCI / CD & pipeline design : ArgoCD, Jenkins, GitHub Actions (or similar)Scripting / coding : Bash, Python, GoCollaboration & Leadership
Design and maintain delivery pipelines to support dev and data teamsLead incident response, RCA, and postmortems with process improvementsPartner with developers and various domain-specific engineers to deliver reliable platform servicesMust-Have Qualifications
2+ years of hands-on experience managing and operating Kubernetes and OpenShift clusters.Strong background with Microsoft Azure services (compute, networking, storage, data services).Proven experience with automation and Infrastructure-as-Code tools (Terraform, Ansible, GitOps).Proficiency in observability and monitoring tools (Datadog, Prometheus, Grafana).Scripting and coding skills in Bash, Python, or Go .Preferred / Stand-Out Skills
Experience bridging on-premise and cloud environments in hybrid service models.Expertise with Kafka / AMQ messaging platforms , HashiCorp Vault , and CI / CD tools such as ArgoCD, Jenkins, or GitHub Actions .Background in leading incident response and postmortems , with focus on root cause analysis and continuous improvement.