Search jobs > Dallas, TX > Remote > Manager site reliability

Senior Manager, Site Reliability

Cambium Learning
Dallas, Texas
Remote
Full-time

Overview :

As the Senior Manager of Site Reliability, you will play a crucial role in ensuring the stability, performance, and security of our SaaS applications.

You will lead a team of skilled professionals responsible for maintaining and enhancing the reliability of our systems through robust observability, monitoring, threat detection, and mitigation strategies.

The ideal candidate will bring extensive experience in managing complex SaaS environments and a deep understanding of best practices in site reliability engineering.

Job Responsibilities

Team Leadership :

Lead and mentor a team of site reliability engineers to ensure a high level of expertise and efficiency.Drive initiatives to enhance the technical skills and efficiency of the team.

Foster a culture of collaboration, innovation, and continuous improvement.

Hands-On Technical Leadership :

Actively contribute to the design, implementation, and maintenance of observability, monitoring, and security systems.Lead by example, working hands-on to troubleshoot issues and optimize system performance.

Observability and Monitoring :

Develop and implement comprehensive observability and monitoring strategies to proactively identify and address potential issues before they impact system performance.

Collaborate with development leadership to improve performance and scalability of services developed by providing relevant and actionable metrics in early stages of development.

Utilize industry-leading tools and practices to maintain visibility into the health and performance of our systems.

Threat Detection and Mitigation :

Design and implement robust security measures to detect and mitigate potential threats to our SaaS infrastructure.Stay informed about the latest cybersecurity threats and trends, and implement proactive measures to safeguard our systems.

Incident Response :

Actively participate in incident response activities, leading the team to quickly resolve and learn from incidents.Develop and maintain incident response plans to ensure a rapid and effective response to any service interruptions or security incidents.

Conduct post-incident analyses to identify root causes and implement preventive measures.

Infrastructure Optimization :

Collaborate with cross-functional teams to optimize the performance and scalability of our infrastructure.Implement automation and efficiency improvements to enhance overall system reliability.

Job Requirements

  • Bachelor's degree in Computer Science, Information Technology, or a related field.
  • Proven hands-on experience (5+ years) in a site reliability engineering or similar role.
  • Leadership experience (3+ years) with a focus on technical mentorship and skill development.
  • In-depth knowledge of observability tools, monitoring systems, and security best practices.
  • Proven leadership and team management skills.
  • Excellent problem-solving and communication abilities.
  • In-depth experience with AWS.

An Equal Opportunity Employer

30+ days ago
Related jobs
Promoted
VirtualVocations
Garland, Texas

A company is looking for a Senior DevOps - Site Reliability Consultant. Key Responsibilities:Provide expert-level support for critical Member facing AM and Hosting systemsLead the implementation and automation of Agile methodologiesEnsure site reliability by providing recommendations for internal an...

iHeartMedia
Virtual, TX

The Senior Site Reliability Engineer will be responsible for leading a talented team of SREs/DevOps Engineers across a wide variety of Cloud Services. Run Reliability Incident management processes along with Root Cause Analysis, developing Runbooks . ...

Promoted
VirtualVocations
Carrollton, Texas

A company is looking for a Site Reliability Engineer Manager. ...

Federal Reserve System
Dallas, Texas

As a Senior Engineer of the SRE / Production Operations team for FedNow, you will operate the production environment for the program. ...

Promoted
VirtualVocations
Garland, Texas

A company is looking for a Senior Site Reliability Engineer. ...

cyrus one
Carrollton, Texas

The Senior Manager of the Facility Reliability Group (FRG) is responsible for overseeing, managing, and developing a team dedicated to ensuring the efficient coordination, support, and communications of facility operations activities, central alarm monitoring, and incident response. The FRG Senior M...

Gartner
Irving, Texas

Join a world-class team of skilled engineers who build creative digital solutions to support our colleagues and clients.We make a broad organizational impact by delivering cutting-edge technology solutions that power Gartner.Gartner IT values its culture of nonstop innovation, an outcome-driven appr...

AppFolio, Inc
Dallas, Texas

We are hiring a Senior Site Reliability Engineer to run and evolve AppFolio Investment Manager’s ecosystem of services. This position, as with all members of Investment Manager R&D, may require on-call responsibilities. You’ll be a key member of the team that provides reliable, scalable infrastructu...

Citizens Bank
Texas
Remote

Manager Software Engineering - Site Reliability Engineering (SRE),. ...

Citizens
TX, United States

As the Manager of Site Reliability Engineering (SRE), you will play a critical role in ensuring the performance, reliability, and scalability of our systems. Leveraging the principles of Site Reliability Engineering pioneered by Google, you will lead a team of talented engineers in implementing best...