Product Manager For Agent Evaluation Platform
Everyone's building AI agents now, but here's the problem : nobody really knows if their agents are actually working well. Sure, you can see that your agent completed a task, but did it solve the user's actual problem? Did it deliver real business value, or just go through the motions? Right now, most people test their agents manually, which doesn't scale and isn't reliable.
The Agent Evaluation Platform (the name is to be defined) will automatically evaluate agent performance not just "did it finish the task" but "did it achieve the outcome the user actually wanted." Think of it like Langfuse, but instead of testing individual prompts, we're evaluating entire agent workflows, complex chains of actions, and multi-agent systems.
This is especially important as companies start paying for agents based on outcomes rather than just usage. You need to know your agent is actually delivering value, not just burning through API calls.
What You'll Do
As a Product Manager, you'll figure out how to turn agent evaluation research into a real product. That means understanding how companies currently test their AI systems, what they're missing, and how our platform can fill that gap. You'll work with our engineering team to build evaluation pipelines that can assess everything from simple chatbots to complex multi-step agent workflows. You'll also need to find early customers companies building AI agents who are frustrated with current testing methods.
Who You Are
What We Offer
Upload your resume and tell us a few words about yourself we'd love to hear from you!
Product Manager Platform • Winston Salem, NC, US