Need SRE candidate with good Java Dev background interested in this role with strong hands-on experience in building dashboards and setting up alerts using Splunk, Grafana and GCL.
Required Qualifications :
- 10+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following : work experience, training, military experience, education
- 10+ years of experience in Production support / Site Reliability Engineering teams with continued focus on improving Platform health
- Familiar with Agile or other rapid application development practices
- Hands-on expertise with Automated testing, Process Automation & building dashboards using APM tools.
- Experience with distributed (multi-tiered) systems, algorithms, relational databases, and NoSQL databases.
- Knowledge & Exposure caching tools (Redis, memcache) or messaging tools such as MQ, Kafka.
- Must have working knowledge of APM tools such as splunk, GCL, ELK, Grafana, Prometheus etc.
- Able to create Dashboards using GCL / Splunk / ELK and setup alerts.
- Working knowledge of CICD is a plus Source control like Git, Continuous Integration Jenkins / UCD Release etc.
- Ability to work with Engineering teams across the ecosystem such as Security, Networking & Infrastructure challenges which can impact platform health & resiliency.
- Shell Scripting / DevOps tools like Ansible with good knowledge of yaml file to write playbooks.
- Experience with distributed storage technologies like NFS as well as dynamic resource management frameworks PCF, Kubernetes / OpenShift, AWS or Azure.
- Tech Stack : Java / J2EE (Spring, Spring Boot, Python, Shell Scripting, Kafka, Oracle, MongoDB etc.).
- Able to work on shift duty in a 12 / 7 support organization.
Job Expectations :
You will be a core member of a SRE support team, utilizing the latest technology tools to write code, test cases, working with API specs and automate to maintain the resiliency, performance and availability of Digital Sales & Marketing platforms.Strong & relevant experience in supporting Web / API platforms built using Java / java script Stack (Spring / Spring boot, Javascript -Angular / react)Proficiency in dealing with Legacy infrastructure along with cloud infrastructure (on prem & 3rd party) such as PCF or Azure.Identifying opportunities to adopt to new technologies while improving efficiency by removing toil and continues to drive efficiency & optimization.Proactive monitoring of app performance through Splunk, App dashboards, App dynamics & Dynatrace etc.Represent Platform engineering teams during production outages and collaborate with engineering teams to resolve production outages. Collaborate with stakeholders across engineering functions to own / derive RCA & work towards permanent resolution.Plan, support, execute and comply with governance programs / processes in support of a strong control environment in your functional area. Leverage process documentation to improve operational controls and identify and remediate process deficiencies.Proactively identify, communicate, mitigate and escalate risk originating from non-compliance of processes, operational errors, and data integrity issues in all applicable processes.Ability to influence SRE practices within and outside teams to enable a strong DevOps culture within the organization.Able to work on shift duty in a 12 / 7 support organization.Responsible for working with Engineering teams to maintain the SLAs & SLOs. Constantly looking out for opportunities to improve platform metrics & communicate the same to stakeholders.Exposure and proficiency in different API styles such as SOAP, REST, Micro services etc.Working knowledge of Unix, Linux and Postman.#J-18808-Ljbffr