AI Engineer - Site Reliability ResearcherTraversal • New York, NY, United States

AI Engineer - Site Reliability Researcher

Traversal • New York, NY, United States

30+ days ago

Job type

Full-time

Job description

About Traversal

Traversal is the AI Site Reliability Engineer (SRE) for the enterprise—already trusted by some of the largest companies in the world to troubleshoot, remediate, and even prevent the most complex production incidents. Our mission is to free engineers from endless firefighting and enable them to focus on creative, high-impact work.

Our roots remain deeply embedded in AI research, and we’re channeling that scientific rigor and creativity into building the premier AI agent lab for the enterprise. Hence, what we’re proudest of is assembling the most talented yet nicest group of individuals, including researchers from MIT, Harvard, and Berkeley, to world-class engineers from industry : Citadel Securities, Cockroach Labs, Cerebras Systems, Glean, Nuro, Perplexity, Pinecone, and more, to take on one of the hardest problems for AI to solve. Without the entire team, none of this would be possible.

The Role

As an AI Site Reliability Researcher, you’ll play a central role in ensuring the scalability, reliability, and observability of our AI platform. This is a high-impact, cross-functional role where you’ll design systems and processes that keep our AI-driven infrastructure healthy and performant.

We’re entering a phase of rapid growth and scale, driven by the needs of large enterprise customers. That means pressure on everything from deployments to developer workflows. We’re building our own distributed systems, maturing our CI / CD pipelines, and managing complex hybrid environments (SaaS and on-prem). You’ll play a foundational role in establishing the SRE practices that allow us to scale thoughtfully and reliably.

In this role, you’ll define how we do change management across diverse deployment environments, build internal observability from the ground up, and help bring structure to systems that are evolving quickly. You’ll also be a hands-on user of Traversal — your feedback will shape the product directly. And while your focus will be reliability, you’ll collaborate closely with our infra and AI agent teams, with opportunities to influence how AI integrates with real-world production environments.

Responsibilities

Brains Of The Product : Distilling SRE Knowledge into Agentic workflows.
System Design & Architecture : Build scalable and resilient infrastructure to support AI observability agents in both cloud and on-prem environments.
Observability : Built systems to monitor logs, metrics, and traces tied to deployments and developer activity. Power user of observability tools.
Incident Management : Define and lead our on-call and incident response processes, including alerting, debugging, and postmortems.
CI / CD & Deployment : Design and scale our in-house CI / CD systems to support safe, efficient rollouts across hybrid environments.
Infrastructure Automation : Own our infrastructure-as-code stack and improve automation across deployment and provisioning workflows.

Requirements

Experience as an SRE, infrastructure engineer or similar role in fast-paced environments.

Exceptional debugging skills across complex, distributed systems — proven ability to get to root cause quickly across varied tech stacks.

Strong systems design intuition — understands how observability tools fit into architecture and how to leverage them effectively in incident response.

Experience with observability tools (e.g., Datadog, Grafana, Prometheus, OpenTelemetry) and incident response.

Deep understanding of infrastructure automation and CI / CD systems.

Hands-on experience with Terraform, Kubernetes, and cloud environments (AWS or GCP).

Ability to debug distributed systems and drive system-level improvements.

Experience supporting hybrid cloud / on-prem deployments and complex change management.

Nice to Have

Familiarity with AI infrastructure or supporting ML / LLM workloads in production.

Background in developer productivity tooling or internal platform teams.

Prior experience building systems that connect infra events to developer workflows.

Exposure to agentic systems or AI observability platforms.

Compensation

We offer competitive compensation, startup equity, health insurance, and additional benefits. The U.S. base salary range for this full-time, in-person role in New York is $150,000–$300,000, plus equity and benefits. Our salary ranges are based on location, level, and role. Individual compensation is determined by experience, skills, and job-related knowledge.

Why You Should Join Us

We’ll make sure you’re fully supported with health insurance, a great tech setup, flexible time off, and plenty of in-office snacks. We offer competitive salary and equity packages, and take thoughtful consideration with every hire on our small, high-impact team.

Traversal is fully in-office, 5 days a week, based in New York near Madison Square Park. We have a collaborative, hard-working culture and are energized by building the future of AI-powered software maintenance.

Working here means owning meaningful parts of the product, having the flexibility to move fast, and learning constantly. This is a place to grow your career, make a real impact, and help define a new category of infrastructure software.

Create a job alert for this search

Site Reliability Engineer • New York, NY, United States

Related jobs

Pre-Training Research Engineer — Safe, Steerable AI

Anthropic • New York, NY, United States

Full-time

A pioneering AI company is seeking a Research Engineer to enhance its Pre-training team, focusing on developing next-generation language models. The ideal candidate will have an advanced degree and ...Show more

Last updated: 5 hours ago • Promoted • New!

Sr Machine Learning AI Engineer Remote

Nava Software Solutions LLC • Jersey, New Jersey, USA

Remote

Full-time

NAVA Software solutions is looking for a Senior Machine Learning / AI Engineer.Senior Machine Learning / AI Engineer.The next phase focuses on extending. The selected candidate will play a key role ...Show more

Last updated: 8 days ago • Promoted

Remote Market Research Participant (Hiring Immediately)

Earn Haus • Red Bank, New Jersey, US

Remote

Full-time +1

We are urgently looking for people interested in taking online surveys for Fortune 500 brands.If you are a self-starter, looking for flexible hours throughout the week, this may be for you! Earn up...Show more

Last updated: 30+ days ago • Promoted

Flexible Online Research Contributor (Hiring Immediately)

Earn Haus • Red Bank, New Jersey, US

Full-time +2

Last updated: 30+ days ago • Promoted

Flexible Hours Market Research Contributor (Hiring Immediately)

Earn Haus • Long Branch, New Jersey, US

Full-time +1

Last updated: 30+ days ago • Promoted

Senior Research Scientist

Peraton • Red Bank, New Jersey, USA

Part-time

PeratonLabsdelivers innovative solutions and revolutionary new capabilities to solve the most difficult and complex challenges for government agencies utilities and commercial customers.With a dist...Show more

Last updated: 13 days ago • Promoted

Remote Epic HIM Analyst

Insight Global • Oceanport, NJ, United States

Remote

Full-time

Defining Systems Requirements : .Collaborating to understand and execute the Epic application architecture and integration. Serving as a liaison between end-u ' workflow needs and Epic implementation ...Show more

Last updated: 18 days ago • Promoted

Founding Audio AI Research Engineer

David AI • New York, NY, United States

Full-time

David AI is the first audio data research company.We bring an R&D approach to data–developing datasets with the same rigor AI labs bring to models. Our mission is to bring AI into the real world, an...Show more

Last updated: 12 days ago • Promoted

Project Engineer

Equiliem • Asbury Park, NJ, US

Full-time

Position Overview : The Project Engineer position offers a dynamic and self-motivated individual the unique opportunity to be part of a rapidly growing business in a rewarding field.The position off...Show more

Last updated: 20 days ago • Promoted

AI Researcher – AI Index R&D

MSCI • New York City, New York, USA

Full-time

The AI Index R&D team is part of MSCIs broader Index Research and Development organization driving innovation through the integration of artificial intelligence and data science into index rese...Show more

Last updated: 14 days ago • Promoted

Senior AI Research Engineer, Handshake AI

Handshake • New York, NY, United States

Full-time

Our three-sided marketplace connects 18 million students and alumni, 1,500+ academic institutions across the U.Europe, and 1 million employers to power how the next generation explores careers, bui...Show more

Last updated: 9 days ago • Promoted

AI Research Engineer, Enterprise Evaluations

Scale AI, Inc. • New York, NY, United States

Full-time

Scale AI is seeking a technically rigorous and driven.This high-impact role is critical to our mission of delivering the industry's leading. You will be a hands-on contributor to the core systems th...Show more

Last updated: 17 days ago • Promoted

Machine Learning Research Engineer, Agents - Enterprise GenAI

Scale AI, Inc. • New York, NY, United States

Full-time

AI is becoming vitally important in every function of our society.At Scale, our mission is to accelerate the development of AI applications. For 9 years, Scale has been the leading AI data foundry, ...Show more

Last updated: 29 days ago • Promoted

AI Solutions Engineer - Remote

Finn Partners • New York, NY, US

Remote

Full-time

Quick Apply

The AI Solutions Engineer is the primary builder, architect, and technical steward of the organization's internal AI capabilities. This role is responsible for the design, development, and operation...Show more

Last updated: 30+ days ago

Research Engineer, Model Evaluations

Anthropic • New York, NY, United States

Full-time

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...Show more

Last updated: 9 days ago • Promoted

System Analyst

HMH HOSPITALS CORPORATION • Eatontown, NJ, United States

Full-time

Our team members are the heart of what makes us better.With a culture rooted in connection and collaboration, our employees are team members. Here, competitive benefits are just the beginning.It’s a...Show more

Last updated: 11 days ago • Promoted

Agentic and Gen AI Architect - Hybrid

Cognizant • Passaic, NJ, US

Full-time

Gen AI and Agentic AI Architect.Teaneck, NJ or Plano, TX (Hybrid – 2 to 3 days per week in office).We are seeking a visionary and pragmatic AI Architect to lead the design and implementation of Gen...Show more

Last updated: 13 days ago • Promoted

AI Research Engineer, Enterprise Evaluations

Scale AI • New York, NY, United States

Full-time

AI Research Engineer, Enterprise Evaluations.Scale AI is seeking a technically rigorous and driven.This high‑impact role is critical to our mission of delivering the industry's leading.You will be ...Show more

Last updated: 9 days ago • Promoted