Talent.com
Staff Site Reliability Engineer
Staff Site Reliability EngineerMotion Recruitment • New York, New York, United States
Staff Site Reliability Engineer

Staff Site Reliability Engineer

Motion Recruitment • New York, New York, United States
30+ days ago
Job type
  • Full-time
Job description

Staff Site Reliability Engineer
NYC Hybrid with 3 days onsite
Who they are:
Our client is building the AI platform that transforms how insurers evaluate and price risk. Not just another tool that generates summaries or flags issues—they’re creating AI agents that actually understand risk the way veteran underwriters do.

Their platform is processing billions in premium for some of the world’s largest carriers, and they’re just getting started. The technical challenges are wild. They’re teaching AI to understand that a bakery in Florida faces different risks than one in Montana. To know when a manufacturing company’s pivot from toys to medical devices fundamentally changes their risk profile. To make million-dollar decisions with the same intuition as someone who’s been underwriting for 20 years.

What makes our client special isn’t just the technology—it’s that they’re building it with people who deeply understand insurance. Their team includes folks who’ve built and scaled carriers, researchers who’ve pushed the boundaries of AI, and engineers who just love solving seemingly impossible problems. They’re still early, but the impact is already real.

If you want to build AI that matters—that affects real businesses, real people, and billions in economic activity—our client is where you should be. They’re not just digitizing insurance. They’re reimagining what it can be.

What you’ll do:
Lead observability and reliability strategy across the company, moving them from disparate signals to a clear, trusted view of system health by establishing company standards, defining milestones to work toward higher levels of operational maturity, and shared ownership. Operationally, you’ll be responsible for leading their disaster recovery exercises and developing plans for higher levels of maturity to meet their evolving business needs.

JD: Staff Reliability and Observability Engineer

  • Own the end-to-end incident and production experience, including on-call design, incident management, post-incident learning, and clear, template-driven customer communication in partnership with Customer Success.

  • Influence reliability at the application and system level, partnering with engineers to improve instrumentation in code, resolve cross-team tradeoffs, and design for failure across interconnected services and vendors.

  • Establish reliability patterns for modern, AI-driven systems, including long-running requests, partial failures, retries, and graceful degradation, while managing key vendor reliability standards.

Qualifications:

  • Senior+ reliability engineering experience, including time as an SRE, Platform Engineer, or Staff-level engineer, with a background that touches both infrastructure (preferably AWS and/or Azure) and application code.

  • Strong application-level fluency, including analyzing logs and traces, and contributing production code (e.g., meaningful PRs) to improve observability and reliability directly in services.

  • System-level thinking across complex ecosystems, with experience operating and reasoning about multiple interconnected services, vendors, and failure modes, and making explicit, well-documented tradeoffs.

  • Proven influence without authority, demonstrated by raising reliability standards through collaboration across Engineering, Product, and Customer Success, navigating disagreement, and driving alignment—paired with practical experience designing for reliability in AI- and LLM-backed systems using modern developer tooling.

Who you are:

  • A smart self-starter: You have a bias for action. You orient yourself around solutions and outcomes and don’t wait for others to tell you what to do. You also understand how to build alignment and conviction for decisions that can’t be easily reversed.

  • A force multiplier: You look for ways to magnify your impact and your team’s. When you find a productivity hack, you share it with teammates and build tools to make it easier. You document knowledge for future teammates and lean into AI and automation to improve productivity.

  • An empathetic communicator: You communicate nuanced ideas clearly, whether explaining technical decisions in writing or brainstorming in real time. You engage thoughtfully with differing perspectives and compromise when needed.

  • A learner: You thrive on learning new things. You stay current on tech and AI and are excited to share their latest discoveries and hacks.

Create a job alert for this search

Staff Site Reliability Engineer • New York, New York, United States

Similar jobs

Senior DevOps and Site Reliability Engineer, remote

CherreNew York City, NY, United States
Remote
Full-time

Cherre is the real estate industry's leading data management platform, powering more than $3 trillion AUM globally.Our end-to-end platform helps clients connect, transform, analyze, and act on trus...Show more

 • Promoted

Reliability Engineer

Mini-CircuitsNew York, NY, United States
Full-time

Mini-Circuits designs, manufactures and distributes integrated circuits, modules, and sub-systems for high-performance radio frequency (RF) and microwave applications.With design, sales and manufac...Show more

 • Promoted

DevOps & Site Reliability Engineer - AWS / Terraform / Laravel - Remote

SportsRecruitsNew York City, NY, United States
Remote
Full-time

OverviewDevOps / Site Reliability Engineer (Remote)Location :Remote (US-based)Reports to :CTO, SportsRecruitsAbout SportsRecruitsSportsRecruits is the leading sports recruiting network, connecting ...Show more

 • Promoted

Site Reliability Engineer III

JPMorgan Chase Bank, N.A.New York, NY, United States
Full-time

As a Site Reliability Engineering at JPMorgan Chase within the Enterprise technology, liquidity risk team, you are the non-functional requirement owner and champion for the applications in your rem...Show more

 • Promoted

Technical Support Engineer/Site Reliability Engineer (experienced) - New York, NY

FDM GroupNew York, NY, United States
Full-time

This position requires the successful candidate to work on a W2 directly with FDM.We cannot accept C2C, 1099 or employment sponsorship (e.FDM is a global business and technology consultancy deliver...Show more

 • Promoted

Team Lead, Site Reliability Engineering - Storage Layer Service

MongoDBNew York, NY, United States
Full-time

MongoDB's Storage Layer Services (SLS) team is re-architecting the MongoDB cloud storage layer and sits at the heart of our next-generation cloud storage architecture.This relatively new team is bu...Show more

 • Promoted

Sr Staff Systems Engineer

ZT SystemsSecaucus, NJ, United States
Permanent

Staff IT Systems Engineer will work with our team to provide support and advance a wide range of infrastructure and services throughout the global IT environment.Staff IT Systems Engineer, you will...Show more

 • Promoted

Site Reliability Engineer, Commodities Technology

Point72New York, NY, United States
Full-time

Site Reliability Engineer, Commodities Technology.A Career with point72's technology team.As Point72 reimagines the future of investing, our Technology group is constantly improving our company's I...Show more

 • Promoted

Site Reliability Engineer

Omni InclusiveSecaucus, NJ, United States
Full-time

Education and Certification Requirements-.Windows Server engineering and Active Directory Services.S) or an equivalent with extensive work experience.Required Skills & Technical Knowledge-.ADFS, Az...Show more

 • Promoted

Staff Engineer

VitallyBrooklyn, NY, US
Full-time
Quick Apply

We are seeking a seasoned and innovative Staff Engineer to join our engineering team.As a technical leader, you will play a pivotal role in designing and implementing high-impact solutions, mentori...Show more

Trade and Industry - Entry Level Training Programs

DreamboundLong Branch, New Jersey, United States
Full-time

Note: This is an educational program, not a job.Successful completion of the program does not guarantee employment but will equip you with valuable skills for the trades and industry job market.Are...Show more

 • Promoted

Site Reliability Engineer - (Linux & Python/Go)

Elliot PartnershipNew York, NY, United States
Full-time

Site Reliability Engineer - (Linux & Python/Go).New York, NY (Hybrid, 3 days in office).Highly competitive compensation package.Join an elite technology and research group at the forefront of globa...Show more

 • Promoted

Product Reliability Engineer - Defense

Palantir TechnologiesNew York, NY, United States
Permanent

Palantir builds the world's leading software for data-driven decisions and operations.By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving ...Show more

 • Promoted

Site Reliability Engineer

PicoNew York, NY, United States
Full-time

Pico fuels the global capital markets community by providing exceptional market data services and customized managed infrastructure solutions.As financial industry experts at the center of markets ...Show more

 • Promoted

Senior DevOps / Site Reliability Engineer (Terraform)

Purple DriveNew York, NY, United States
Full-time

Job Title: Senior DevOps / Site Reliability Engineer (Terraform).We are seeking a highly skilled DevOps / Site Reliability Engineer (SRE) to join our team in New York.The ideal candidate will have ...Show more

 • Promoted

Lead Site Reliability Engineer (Remote)

LivepeerNew York City, NY, United States
Remote
Full-time

Location :RemoteHours :North America working hoursAbout LivepeerLivepeer is on a mission to build the world's open video infrastructure.Founded in 2017, it is the world's first open-source protocol...Show more

 • Promoted

AVP, Reliability Engineer

Synchrony FinancialNEW YORK, New York, United States
Full-time

The AVP, Reliability Engineer craves working in a hands-on system design and architecture environment, and leads by example to make sure time sensitive projects get done on time and to specificatio...Show more

 • Promoted

Platform Reliability Engineer

TWG Global AINew York, NY, United States
Full-time

At TWG Group Holdings, LLC (“TWG Global”), we drive innovation and business transformation across a range of industries—including financial services, insurance, technology, media, and sports—by lev...Show more