Hybrid Onsite - Irving TX (3 days / week) Site Reliability Engineer
Contract through end of 2025 will extend
Responsibilities
- Design and execute performance tests to evaluate responsiveness scalability and stability of applications.
- Conduct resiliency testing to validate fault tolerance and recovery strategies.
- Implement and monitor observability tools to track system health and detect issues in real time.
- Perform capacity planning and recommend scaling strategies for peak loads.
- Collaborate with developers and operations teams to optimize Java / Spring Boot microservices database queries and infrastructure configurations.
- Configure Kubernetes performance parameters (resource limits requests autoscaling policies).
- Implement resiliency patterns such as circuit breakers bulkheads retries rate limiters and fallback mechanisms .
- Document methodologies and provide training on performance and resiliency best practices.
- Continuously evaluate and improve testing and monitoring processes.
Required Technical Skills
Programming : Strong experience with Java and Spring Boot for microservices.Containerization : Hands-on with Docker ; experience deploying and tuning containerized applications.Scripting : Proficiency in Python and Bash for automation and test scripting.Cloud : Solid experience with Azure (mandatory); familiarity with cloud-native architectures.Observability / APM Tools : Splunk ELK stack AppDynamics (setup monitoring troubleshooting).Architecture & Resiliency : Knowledge of design patterns fault tolerance strategies and distributed systems.Microservices Support : Strong background in supporting and optimizing microservices applications.Computer Science Fundamentals : Algorithms data structures and architectural design best practices.Preferred Skills
Experience with Kubernetes (cluster configuration autoscaling resource tuning).Understanding of networking concepts (DNS load balancing firewalls VPNs).Exposure to CI / CD pipelines and DevOps practices.Key Skills
Kubernetes,FMEA,Continuous Improvement,Elasticsearch,Go,Root cause Analysis,Maximo,CMMS,Maintenance,Mechanical Engineering,Manufacturing,Troubleshooting
Employment Type : Full Time
Experience : years
Vacancy : 1