Talent.com
No longer accepting applications
Engineering Manager, Site Reliability

Engineering Manager, Site Reliability

RedditSan Francisco, CA, US
1 day ago
Job type
  • Full-time
Job description

Engineering Manager, Site Reliability

As an Engineering Manager for Site Reliability, you will be responsible for ensuring the reliability, performance, efficiency, and resilience of your team's systems and services, as well as working to ensure that the experience of your customers other internal engineering teams steadily improves. This includes implementing and maintaining monitoring systems, collaborating with cross-functional teams to address performance bottlenecks and continuously improving the reliability and scalability of our systems to meet the evolving needs of our users. You will find and apply leverage to improve the reliability of all internal services at Reddit, including the ability of SREs and other engineering teams supporting production services to do incident response well.

How You'll Have Impact :

Reporting into the Director of Site Reliability Engineering, your peers will be the other Engineering Managers in SRE and Infrastructure. Your customers will be the other engineering teams at Reddit, who build on our standard infrastructure. You will partner with stakeholders to understand your team's service priorities and contribute to the design, development, and adoption of reliable and performant services within your area.

You will be accountable for building, growing, and mentoring a team of engineers to help Reddit reach its goal of bringing community and belonging to everyone.

What You'll Do

  • Set clear goals for your team, defining success metrics, and track progress towards achieving those goals and contribute to the overall Site Reliability Engineering strategy.
  • Support multiple Reddit product teams with expertise in cloud compute infrastructure (Kubernetes, etc.) to optimize availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. Ensure our infrastructure can scale to a large multiple of what it is today.
  • Establish a stronger Platform-Product interface for feature tracking and prioritization, and provide an opinionated and trusted voice for guiding these decisions.
  • Coordinate across product and engineering teams to understand and widely socialize Reddit's SRE priorities across all of our products.
  • Support the reliable operation of these systems as a Platform for Reddit products, and allow us to rapidly deliver reliable, performant, and efficient services to our end users.
  • Evolve our backend tech stack using modern and internal supported options (Golang, Redis, etc)
  • Support the full talent lifecycle for your team, including hiring, culture-building, growth, and performance management. Coach and develop your direct reports, providing career development plans and fostering a growth-oriented team environment.
  • Provide mentorship and growth opportunities for team members and leaders to evolve in their roles.
  • Set and support a culture of metrics driven Quality, with efficient processes and strong transparency.
  • Drive a cycle of virtuous improvement with blame-free postmortems.

What We Look For

  • 2-4 years experience in leading and developing high-performing site reliability or infrastructure engineering teams, including setting clear goals, tracking metrics, and supporting career growth and coaching distributed, remote teams to high performance.
  • 7+ years of experience developing internet-scale software, with a strong focus on cloud infrastructure and deployment systems. Experience with Go, Kubernetes, Argo, and Flux is a plus.
  • Strong technical judgment and ability to evaluate the quality of engineering decisions related to cloud infrastructure systems (Kubernetes, AWS, GCE). Accountability for the team's technical output and operational decisions.
  • Experience designing, deploying, building or managing distributed systems of significant scale.
  • Track record of assembling a high functioning team.
  • Strong organizational skills, the ability to prioritize tasks and keep projects on schedule.
  • BS degree in Computer Science, similar technical field of study or equivalent practical experience.
  • Benefits :

  • Comprehensive Healthcare Benefits and Income Replacement Programs
  • 401k Match
  • Family Planning Support
  • Gender-Affirming Care
  • Mental Health & Coaching Benefits
  • Flexible Vacation & Reddit Global Days Off
  • Generous Paid Parental Leave
  • Paid Volunteer Time Off
  • The base pay range for this position is : $217,000 - $303,900 USD

    Create a job alert for this search

    Engineering Manager • San Francisco, CA, US

    Related jobs
    • Promoted
    Engineering Manager SRE (Site Reliability Engineering)

    Engineering Manager SRE (Site Reliability Engineering)

    JusbrasilSan Francisco, CA, US
    Engineering Manager SRE (Site Reliability Engineering).Transformar o sistema de Justia com tecnologia no um desafio trivial. Por isso, o Jusbrasil se posiciona como uma empresa AI-first, que utili...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    DevOps projectsBerkeley, CA, United States
    Full-time
    LMArena is an engineering-first startup redefining how the world evaluates large language models.Created in 2023 by UC Berkeley researchers, our neutral, community-driven benchmarking platform attr...Show moreLast updated: 2 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ConductorOneSan Francisco, CA, United States
    Full-time
    ConductorOne is the first AI-native identity security platform that protects every identity : human, non-human, and AI.With powerful automation, platform-level AI, and out-of-the-box connectors, it ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    ProsperSan Francisco, CA, United States
    Full-time
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...Show moreLast updated: 25 days ago
    • Promoted
    Senior Site Reliability Engineer – Platform

    Senior Site Reliability Engineer – Platform

    Icon VenturesSan Francisco, CA, United States
    Full-time
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.We blend cognitive science with machine learning to personalize and enhance the lear...Show moreLast updated: 3 days ago
    • Promoted
    Director, Site Reliability Engineering - Infrastructure Platform

    Director, Site Reliability Engineering - Infrastructure Platform

    OktaSan Francisco, CA, United States
    Permanent
    Director, Site Reliability Engineering – Infrastructure Platform.Okta provides secure access, authentication, and automation, placing identity at the core of business security and growth.The Infras...Show moreLast updated: 6 days ago
    • Promoted
    Manager, Engineering

    Manager, Engineering

    Golden State WarriorsSan Francisco, CA, United States
    Full-time
    Golden State is seeking a proactive and dedicated Manager to lead all aspects of the repair and maintenance for assigned Golden State properties including equipment, systems, and finishes.In this r...Show moreLast updated: 24 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Rivago Infotech IncSan Francisco, CA, United States
    Full-time
    Staff Site Reliability Engineer (SRE).As our Staff SRE, you'll be the primary expert responsible for our entire compute ecosystem. Your key responsibilities will include : .Design, implement, and lead...Show moreLast updated: 6 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantumPalo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Archetype AIPalo Alto, CA, United States
    Full-time
    Get AI-powered advice on this job and more exclusive features.Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team f...Show moreLast updated: 3 days ago
    • Promoted
    Senior Staff Site Reliability Engineer - Platform

    Senior Staff Site Reliability Engineer - Platform

    Icon VenturesSan Francisco, CA, United States
    Full-time
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, includin...Show moreLast updated: 3 days ago
    • Promoted
    Senior Engineering Manager, Site Reliability Engineering

    Senior Engineering Manager, Site Reliability Engineering

    RidgelineSan Ramon, CA, US
    Full-time
    Senior Engineering Manager, Site Reliability Engineering.Senior Engineering Manager, Site Reliability Engineering (SRE) Location : San Ramon, CA. Reno, NV.Are you passionate about building high-leve...Show moreLast updated: 2 days ago
    • Promoted
    Senior / Staff Site Reliability Engineer

    Senior / Staff Site Reliability Engineer

    CrusoeSan Francisco, CA, United States
    Full-time
    Crusoe Energy is on a mission to unlock value in stranded energy resources through the power of computation.We aim to align the long term interests of the climate with the future of global computin...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineering Manager II, Site Reliability Engineering

    Software Engineering Manager II, Site Reliability Engineering

    Google Inc.Sunnyvale, CA, United States
    Full-time
    Software Engineering Manager II, Site Reliability Engineering.Experience owning outcomes and decision making, solving ambiguous problems and influencing stakeholders. deep expertise in domain.Bache...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineering Manager, Reliability

    Software Engineering Manager, Reliability

    Next MatterMountain View, CA, United States
    Full-time
    LinkedIn is the world’s largest professional network, built to help members of all backgrounds and experiences achieve more in their careers. Our vision is to create economic opportunity for every m...Show moreLast updated: 11 days ago
    • Promoted
    • New!
    Principal Site Reliability Engineer

    Principal Site Reliability Engineer

    Early Warning®San Francisco, CA, United States
    Full-time
    At Early Warning, we’ve powered and protected the U.Zelle®, Paze℠, and so much more.As a trusted name in payments, we partner with thousands of institutions to increase access to financial services...Show moreLast updated: 6 hours ago
    • Promoted
    Software Engineering Manager, Reliability

    Software Engineering Manager, Reliability

    LinkedInMountain View, CA, United States
    Full-time
    Software Engineering Manager, Reliability.Software Engineering Manager, Reliability.Get AI-powered advice on this job and more exclusive features. Direct message the job poster from.Our vision is to...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    SpeakSan Francisco, CA, United States
    Full-time
    Our mission is to reinvent the way people learn, starting with language.Learning a language can change a life by opening doors to new cultures, careers, and communities. Two billion people around th...Show moreLast updated: 3 days ago