Job Title : Site Reliability Engineer (SRE) Practice Lead
Location : Charlotte, NC(Onsite)
Employment mode : Contract
Job Summary
The SRE Practice Lead is a senior technical leader responsible for building and guiding the SRE function to ensure the highest levels of reliability, availability, and scalability of critical utility infrastructure, control systems, and operational technology. This role focuses on delivering resilient utility services and seamless grid operations through strategic planning, engineering excellence, and cross-functional collaboration. The SRE Practice Lead drives automation, operational efficiency, and compliance with strict regulatory standards unique to the utilities sector.
Key Responsibilities
- Lead and mentor a specialised SRE team dedicated to supporting utility grid systems, energy management platforms, and critical infrastructure.
- Develop and execute the SRE strategic roadmap focused on enhancing grid reliability, disaster recovery, and business continuity aligned with industry regulations.
- Implement rigorous monitoring, alerting, and incident management processes tuned specifically for utility operational technology and control environments.
- Champion automation initiatives to reduce manual toil in infrastructure management, configuration, and routine operational tasks related to utility systems.
- Collaborate closely with electrical engineers, grid operators, control system engineers, and software development teams to embed reliability and resilience by design.
- Oversee capacity planning and performance optimization to ensure utility systems scale efficiently during peak loads and emergency events.
- Lead root cause analysis and post-incident reviews for utility outages, coordinating cross-disciplinary teams to implement corrective actions addressing the unique challenges of utility operations.
- Ensure compliance with industry standards and regulations such as NERC CIP, FERC, and other applicable utility regulatory frameworks through security and audit readiness.
- Drive the adoption of modern cloud-native technologies and infrastructure as code approaches to accelerate innovation without compromising reliability.
- Advocate for continuous SRE training and knowledge sharing focused on power systems, smart grid technology, and critical infrastructure reliability.
Required Skills and Experience
Proven leadership experience managing site reliability engineering within utilities or critical infrastructure environments.Expertise in reliability engineering concepts, monitoring tools, automation frameworks, and incident response methodologies tailored for the utilities sector.Strong scripting and automation skills with tools such as Terraform, Ansible, Kubernetes, and cloud platforms commonly used in utilities.Familiarity with regulatory compliance requirements (e.g., NERC CIP) and security practices applicable to utility operations.Excellent communication skills to collaborate effectively with technical and non-technical stakeholders across the utility organisation.Ability to translate complex technical challenges into strategic in