Talent.com
No longer accepting applications
Site Reliability Engineer

Site Reliability Engineer

Agile DataproCA, United States
30+ days ago
Job description

About the Job : -

  • Role : SRE or Site Reliability Engineer
  • Type of Engagement : Hybrid - 2 days from Mountain View office
  • Location : Mountain View
  • Employment Type : Full Time with Client (W2)

Job Description / Requirement :

Design, implement, and maintain complex data systems supporting millions of customers with Cloud Native principles and best practices to ensure highly available, secure, performant, and scalable database systems

  • Build and maintain CI / CD pipelines in Jenkins
  • Build and deploy services in Kubernetes cluster using helm, kustomize, etc
  • Contribute to infrastructure changes to AWS with deep understanding of AWS services
  • Engage in on-call for pre-production and production systems supporting multi-million users
  • Write / Review RCA docs to prevent recurrence of Incidents in future and share the learnings
  • Contribute to major system upgrades, deployment automation, monitoring enhancements and Production changes
  • Create operational playbooks, contribute to how-to articles, and gain domain knowledge to drive changes in the team
  • Participate and contribute in FMEA / Chaos testing, Security remediations, etc
  • Share best practices and patterns for operational excellence and cost optimization
  • Reduce or eliminate manual steps by automating as much as possible
  • Continuously look for opportunities to increase developer velocity and productivity
  • Qualifications :

  • Bachelor’s or master’s degree in computer science or a related technical field. Equivalent experience will be considered
  • 4+ years of hands-on development & operational experience with building and maintaining infrastructure in AWS
  • Extensive performance monitoring, troubleshooting & tuning experience
  • Experience with AWS services and hands-on knowledge of hosting on Cloud
  • Experience with scripting languages for DevOps automation
  • Experience with any one of the programming languages : Java / Python / Ruby
  • Knowledge of Docker & Kubernetes, ArgoCD,
  • Experience with monitoring and observability using Splunk, Wavefront, AppDynamics, Prometheus, Tracing, etc
  • Education :

    Bachelor’s degree in computer science, Software Engineering, or a related field.

    If you are interested to pursue the opportunity, please send your updated resume to [email protected] along with your rate / salary information