Job Title : DevOps / Site Reliability Engineer
Location : Englewood Cliffs NJ (Onsite from day one)
Duration : 12 Month
Mandatory skill : CI / CD AWS and / or GCP Python or Bash or Groovy monitoring tools like Datadog Ansible JMeter.
Key Responsibilities
Support and enhance observability (monitoring logging alerting) across production systems
Help maintain SLIs / SLOs for key services
Participate in evaluating services for production readiness
Collaborate with development teams to identify reliability risks and improve system architecture
Contribute to automation of operations including CI / CD pipelines incident response and infrastructure provisioning
Participate in incident response and on-call rotations for critical services
Contribute to post-incident analysis and drive reliability improvements
Partner with security infrastructure and product teams to support performance compliance and operational excellence
Must-Haves
Willingness to work onsite and participate in a 24 / 7 on-call rotation as needed
5 years of experience managing and supporting high-traffic digital platforms
Strong experience with CI / CD pipelines and deployment automation
Experience with cloud platforms such as AWS and / or GCP
Solid scripting skills (e.g. Python Bash Groovy)
Hands-on experience with observability and monitoring tools like Datadog New Relic AppDynamics or similar
Understanding of web mobile and OTT architectures
Experience supporting large scale websites Mobile and OTT applications microservices APIs and distributed systems
Experience with infrastructure-as-code tools such as Ansible Terraform or Chef
Familiarity with performance testing tools like JMeter or k6
Hands on experience with debugging tools like Charles Proxy or Fiddler
Preferred Qualifications
Experience working with CDNs (e.g. Akamai) and reverse proxies (e.g. NGINX Varnish)
Exposure to video streaming platforms and Familiarity with application / infrastructure security controls and best practices
Certifications in SRE DevOps or Performance Engineering are a plus
Key Skills
Kubernetes,FMEA,Continuous Improvement,Elasticsearch,Go,Root cause Analysis,Maximo,CMMS,Maintenance,Mechanical Engineering,Manufacturing,Troubleshooting
Employment Type : Full Time
Experience : years
Vacancy : 1
Site Reliability Engineer • Englewood Cliffs, New Jersey, USA