OverviewWe are seeking a Mid-Level Site Reliability Engineer (SRE) to join our growing team in Frisco TX. This role is an exciting opportunity to contribute to the reliability and performance of smart transportation systems. As an SRE, you will work to ensure our systems are highly available, resilient, and scalable, and help optimize operations across infrastructure and applications. ResponsibilitiesSystem Reliability : Monitor and maintain the health, availability, and performance of critical transportation services.Incident Management : Respond to incidents promptly, perform root cause analysis, and coordinate efforts to resolve issues quickly and efficiently.Automation : Develop and implement automation scripts to streamline operational tasks and improve efficiency.Monitoring & Performance : Set up and maintain monitoring tools to track service health, performance, and resource
Planning : Help assess and plan for capacity, scaling infrastructure to meet the growing demands of the transportation systems.Collaboration : Work with software engineering, infrastructure, and operations teams to improve the reliability of systems and services.System Optimization : Identify performance bottlenecks, troubleshoot issues, and work on optimizations at both infrastructure and application layers.Continuous Improvement : Contribute to the ongoing improvement of operational processes, documentation, and best practices in the SRE team.Disaster Recovery : Participate in designing and testing disaster recovery plans to ensure the continuity of critical services.This list of responsibilities might not cover everything you'll end up doing. QualificationsExperience : 3-5 years of experience in Site Reliability Engineering, DevOps, or a similar role, preferably in a mission-critical or large-scale environment.Technical Skills : Experience with cloud platforms (AWS, Azure, GCP) and container orchestration tools (Docker, Kubernetes).Proficiency with monitoring and logging tools (Prometheus, Grafana, ELK stack, Datadog, etc.).Strong scripting skills in Python, Bash, or Go.Solid understanding of Linux and Windows
Knowledge : Familiarity with relational databases (MySQL, PostgreSQL, etc.) and distributed systems.Collaboration & Communication : Excellent teamwork and communication skills, with the ability to work across teams to improve service reliability.Problem-Solving : Strong troubleshooting skills with a proactive, solution-oriented mindset.Experience in Intelligent Transportation : While not required, familiarity with transportation systems, autonomous vehicles, or real-time data systems is a plus.Preferred Qualifications : Experience with traffic management systems, sensor data processing, or other intelligent transportation systems.Knowledge of infrastructure-as-code tools (Terraform, Ansible).Exposure to CI / CD pipelines and version control systems (Git, Jenkins, etc.). BenefitsWe offer a Total Rewards plan designed with you and your family's health and wellness in mind that includes : Paid days off (i.e. vacation, sick days, bereavement leave) Health and Dental plans Retirement plans Employee and Family Assistance Program (EFAP) Employee referral program We welcome applicants from all backgrounds, regardless of race, color, religion, sex, veteran status, sexual orientation, gender identity, national origin, age, or disability or any other protected characteristics in accordance with applicable federal, state / provincial, and local laws. We're committed to creating a workplace where everyone feels valued and respected. We appreciate all responses and will acknowledge only those being considered for an interview. We respectfully request no calls or unsolicited resumes from Agencies.
Site Reliability Engineer • Plano, TX, United States