Overview :
TekWissen is a global workforce management provider headquartered in Ann Arbor Michigan that offers strategic talent solutions to our clients world-wide. Our client provider of digital technology and transformation information technology and services
Position : Senior Site Reliability Engineer (SRE) / DevOps Engineer
Location : Aliso Viejo CA
Duration : 11 Months
Job Type : Temporary Assignment
Work Type : Onsite
Job Description
- We are seeking a highly experienced SRE / DevOps Engineer to support and scale a Kubernetes-based API Gateway platform built on a Java technology stack.
- The role focuses on reliability observability automation and performance while also contributing to POCs around next-generation AI Gateway capabilities.
Key Responsibilities
Platform Reliability & Operations
Own reliability availability scalability and performance of API Gateway services running on KubernetesDesign and implement SRE best practices including SLIs SLOs SLAs error budgets and incident managementLead production readiness reviews root cause analysis (RCA) and post-incident improvementsDrive capacity planning performance tuning and resilience testingKubernetes & Cloud Engineering
Manage and optimize Kubernetes clusters (EKS / AKS / GKE / On-prem)Develop and maintain Helm charts manifests and deployment strategiesImplement rollout strategies such as blue-green canary and rolling deploymentsCollaborate with development teams to ensure cloud-native design patternsObservability & Monitoring (Strong Focus)
Build and maintain enterprise-grade observability (O11y) solutions :Prometheus & Grafana for metrics and dashboardsSplunk for centralized logging and alertingOpenTelemetry for distributed tracingDefine actionable alerts and dashboards for platform and application healthImprove MTTR through better visibility and automationCI / CD & Automation
Design and maintain CI / CD pipelines (Jenkins GitHub Actions GitLab CI etc.)Automate infrastructure using Infrastructure as Code (Terraform CloudFormation etc.)Develop automation scripts using Python Bash or GroovySecurity & Compliance
Implement DevSecOps practices including secrets management image scanning and RBACWork closely with security teams on vulnerability remediation and compliance controlsInnovation & POCs
Actively contribute to POCs for AI Gateway / Intelligent API Gateway initiativesEvaluate and prototype integrations with AI / ML-driven routing observability and security featuresStay current with emerging SRE cloud and AI gateway technologiesRequired Skills & Qualifications
Must Have
7 8 years of experience in SRE / DevOps / Platform EngineeringStrong hands-on experience with Kubernetes in production environmentsSolid understanding of Java-based applications and JVM performance considerationsDeep expertise in Splunk Prometheus Grafana and observability practicesExperience operating API Gateway platforms (Kong Apigee NGINX Istio etc.)Strong Linux fundamentals and networking knowledge (TCP / IP DNS HTTP TLS)Experience with cloud platforms (AWS / Azure / GCP)Nice to Have
Experience with OpenTelemetry and distributed tracingExposure to AI Gateway / Intelligent Traffic Management conceptsExperience with service mesh (Istio / Linkerd)Certification in Kubernetes (CKA / CKAD) or Cloud platformsSoft Skills
Strong troubleshooting and problem-solving skillsAbility to work cross-functionally with developers architects and security teamsProactive mindset with a passion for automation and reliabilityGood documentation and communication skillsTekWissen Group is an equal opportunity employer supporting workforce diversity.
Key Skills
Kubernetes,FMEA,Continuous Improvement,Elasticsearch,Go,Root cause Analysis,Maximo,CMMS,Maintenance,Mechanical Engineering,Manufacturing,Troubleshooting
Employment Type : Full Time
Experience : years
Vacancy : 1