Site Reliability Engineer (SRE) With Observability
10+ years
We are looking for an experienced Site Reliability Engineer to join our team and ensure our systems remain reliable, scalable, and performant - especially during high-visibility, high-traffic events. This role focuses on proactively preparing infrastructure and services for major events, designing scalable solutions, automating workflows, and acting as a first responder during live incidents.
You will collaborate with engineering, product, and event operations teams to make sure our customers experience smooth, uninterrupted service - even at massive scale.
What You'll Do
Serve as an on-call point of contact during live events
Monitor system health in real time and proactively mitigate performance issues
Rapidly diagnose and mitigate production issues under pressure
Lead post-event reviews, analyzing performance data and incident timelines
Document learnings and recommendations to improve reliability at scale
Site Reliability Engineer Sre • Brookville, NY, United States