Talent.com
Site Reliability Manager

Site Reliability Manager

Macmillan LearningNew York, NY, United States
30+ days ago
Job type
  • Full-time
Job description

Description

The Site Reliability Manager (SRM) maintains the availability, reliability, and performance of internal applications and SaaS platforms. This role involves managing incidents, optimizing system performance, and ensuring operational excellence through automation and monitoring strategies.

What you'll do :

  • Lead incident management processes, ensuring swift resolution and communication during outages. Conduct root cause analyses and implement preventive measures.
  • Design and maintain robust monitoring systems for internal and third-party applications, establishing SLIs, SLOs, and SLAs.
  • Automate operational tasks and develop self-healing systems to reduce manual intervention.
  • Collaborate with cross-functional teams and vendors to maintain system performance and address potential reliability issues proactively.
  • Provide leadership in system performance reporting, ensuring proactive communication with stakeholders on system health, ongoing initiatives, incident updates, and post-resolution analysis.

What you'll bring :

  • Expertise with monitoring tools (e.g., Splunk, Azure Montor) and cloud platforms (e.g., Azure, AWS).
  • Familiarity with ITIL frameworks and advanced automation practices.
  • Strong scripting skills (e.g., Python, Bash) and familiarity with Infrastructure as Code tools.
  • Excellent problem-solving and communication skills.
  • Ideal experience :

  • Proven experience (5+ years) in Site Reliability Engineering, DevOps, or related fields.
  • Service Now and Pager Duty (or similar).
  • Managing SaaS platforms like Google Workspace.
  • This role will have an annual salary of $120k-$130k.

    Macmillan Publishers is the U.S. trade company that is part of the Holtzbrinck Publishing Group, a large family-owned group of media companies headquartered in Stuttgart, Germany. Holtzbrinck Publishing Group's publishing companies include prominent imprints around the world that publish a broad range of award-winning books for children and adults in all categories and formats.

    U.S. publishers include Celadon Books, Farrar, Straus and Giroux, Flatiron Books, Henry Holt & Company, Macmillan Audio, Macmillan Children's Publishing Group, The St. Martin's Publishing Group, and Tor Publishing Group. In the UK, Australia, India, and South Africa, companies in the Holtzbrinck Publishing Group publish under the Pan Macmillan name. The German publishing company, Holtzbrinck Deutsche Buchverlage, includes among its imprints S. Fischer, Kiepenheuer & Witsch, Rowohlt, and Droemer Knaur.

    We are an Equal Opportunity Employer. We are actively seeking job applicants who reflect a broad representation of differences, including race, ethnicity, religion, sex, sexual orientation, gender identity / expression, physical ability, neurodiversity, age, family status, economic background and status, geographical background and status, and perspective. We believe that the best companies reflect the incredible diversity in viewpoints, backgrounds, and identities of the world in their staffs, and are committed to inclusive hiring across departments and levels. The successful candidate for this position will be an employee of Macmillan Publishing Group, LLC.

    Equal Opportunity Employer

    This employer is required to notify all applicants of their rights pursuant to federal employment laws.

    For further information, please review the Know Your Rights notice from the Department of Labor.

    Create a job alert for this search

    Site Reliability Manager • New York, NY, United States

    Related jobs
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    MarketaxessNew York, New York, United States
    Full-time
    MarketAxess is on a journey to digitally transform one of the world’s largest financial markets, enabling the shift from analog, phone-based trading to a fully electronic marketplace.Why does this ...Show moreLast updated: 4 hours ago
    Site Reliability Engineer NAM (F / M / D)

    Site Reliability Engineer NAM (F / M / D)

    FlowdeskNew York, NY, US
    Full-time +1
    Quick Apply
    Flowdesk's mission is to build a global financial institution for digital assets, one designed from the ground up for market integrity and efficiency. To achieve this in a rapidly evolving market, w...Show moreLast updated: 5 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    WriterNew York, New York, United States
    Full-time
    Writer is the full-stack generative AI platform delivering transformative ROI for the world’s leading enterprises.Named one of the top 50 companies in AI by Forbes and one of the best places to wor...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer 3

    Site Reliability Engineer 3

    MongodbNew York, New York, United States
    Full-time
    MongoDB’s mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data. We enable organizations of all sizes to easily build, scale, and...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    JustworksNew York, New York, United States
    Full-time
    At Justworks, you’ll enjoy a welcoming and casual environment, great benefits, wellness program offerings, company retreats, and the ability to interact with and learn from leaders in the startup c...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer III

    Site Reliability Engineer III

    VimeoNew York, New York, United States
    Full-time
    Do you love working with cloud infrastructure at scale? Optimizing the last bit of performance and efficiency out of applications that get hundreds of thousands of requests per second? Digging deep...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer - Cloud

    Site Reliability Engineer - Cloud

    DataikuNew York, New York, United States
    Full-time
    At Dataiku, we're not just adapting to the AI revolution, we're leading it.Since our beginning in Paris in 2013, we've been pioneering the future of AI with a platform that makes data actionable an...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    AlchemyNew York, New York, United States
    Full-time
    Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers th...Show moreLast updated: 30+ days ago
    • Promoted
    Site Manager, Stamford, #203

    Site Manager, Stamford, #203

    GopuffStamford, CT, US
    Full-time
    Gopuff is seeking a Site Manager to join the Field Operations team.We are looking for a self-starting and entrepreneurial leader. The Site Manager role is an exciting and fast-paced role within Gopu...Show moreLast updated: 23 hours ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    HebbiaNew York, New York, United States
    Full-time
    The user interface for AGI – Hebbia is AI that works the way you work.Designed to be generally capable– it can tackle even the most complex tasks, citing answers over any amount of sources.By showi...Show moreLast updated: 30+ days ago
    • Promoted
    Sr. Site Reliability Engineer

    Sr. Site Reliability Engineer

    VimeoNew York, New York, United States
    Full-time
    Do you love working with cloud infrastructure at scale? Optimizing the last bit of performance and efficiency out of applications that get hundreds of thousands of requests per second? Digging deep...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer - Infrastructure

    Senior Site Reliability Engineer - Infrastructure

    The Trade DeskNew York, NY, United States
    Full-time
    The Trade Desk is changing the way global brands and their agencies advertise to audiences around the world.How? With a media buying platform that helps brands deliver a more insightful and relevan...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer, Distribution Engineering

    Site Reliability Engineer, Distribution Engineering

    NbcuniversalStamford, Connecticut, United States
    Full-time
    NBCUniversal is one of the world's leading media and entertainment companies.We create world-class content, which we distribute across our portfolio of film, television, and streaming, and bring to...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    DevOps Site Reliability Engineer- Visa Independent

    DevOps Site Reliability Engineer- Visa Independent

    Shrive Technologies LLCEnglewood Cliffs, New Jersey, USA
    Full-time
    DevOps / Site Reliability Engineer.Englewood Cliffs NJ (Onsite from day one).CI / CD AWS and / or GCP Python or Bash or Groovy monitoring tools like Datadog Ansible JMeter. Support and enhance observabi...Show moreLast updated: 1 hour ago
    • Promoted
    Senior Site Reliability Engineer (SRE)

    Senior Site Reliability Engineer (SRE)

    StubhubNew York, New York, United States
    Full-time
    StubHub is on a mission to redefine the live event experience on a global scale.Whether someone is looking to attend their first event or their hundredth, we’re here to delight them all the way fro...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Infrastructure / Site Reliability Engineer

    Senior Infrastructure / Site Reliability Engineer

    Particle HealthNew York, New York, United States
    Full-time
    Particle Health is revolutionizing healthcare data analytics and interoperability.Our mission is to unlock the power of medical records in an intelligent platform that focuses health back on the pa...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Clay LabsNew York, New York, United States
    Remote
    Full-time
    Our mission is to help businesses grow — without huge investments in tooling or manual labor.We’re already helping over 100,000 people grow their business with Clay. From local pizza shops to enter...Show moreLast updated: 30+ days ago
    Senior Site Reliability Engineer (SRE

    Senior Site Reliability Engineer (SRE

    GovServicesHubNew York, NY, us
    Full-time
    Quick Apply
    Senior Site Reliability Engineer (SRE).At January, we’re transforming the lives of borrowers by bringing humanity to consumer finance. Our data-driven products empower financial institutions to stre...Show moreLast updated: 10 days ago