1\. Role OverviewMercor is collaborating with a leading AI research team to advance DeepResearch-2-App pipelines that simulate real-world code generation tasks. We’re seeking senior-level software engineers to serve as independent evaluators and supervisors in this process. You’ll help assess and refine AI-generated code across a wide range of domain-specific scenarios, with a focus on feasibility, functionality, and test coverage. This is a part-time, project-based contract ideal for highly experienced engineers looking to contribute to cutting-edge AI evaluation.
- 2\. Key Responsibilities
- Review domain-generated prompts and assess their feasibility from a coding perspective
- Supervise model outputs and validate Docker file execution
- Design and implement 40–60 unit tests per evaluation set
- Review peer-generated unit tests for completeness and robustness
- Execute unit tests and confirm code performance and reliability
- 3\. Ideal Qualifications
- 6+ years of professional software engineering experience
- Deep specialization in backend or full-stack development, with testing and evaluation experience
- Strong ability to assess technical feasibility and debug complex systems
- Experience with Docker and automated testing frameworks
- Detail-oriented mindset and ability to provide structured technical feedback
- 4\. More About the Opportunity
- Remote and asynchronous — set your own schedule
- Estimated workload : ~20 hours per week
- Project-based contract, with ongoing need for evaluations
- 5\. Compensation & Contract Terms
- $120 / hour for all services rendered
- Paid weekly via Stripe Connect
- You’ll be classified as an independent contractor
- 6\. Application Process
- Submit your resume to get started
- Complete a brief form to detail your technical expertise
- If selected, you’ll receive onboarding materials and sample tasks