Overview
Get AI-powered advice on this job and more exclusive features. Turing is one of the world’s fastest-growing AI companies, pushing the boundaries of AI-assisted software development. Our mission is to empower the next generation of AI systems to reason about and work with real-world software repositories. You’ll be working at the intersection of software engineering, open-source ecosystems, and frontier AI.
Project Overview
We're building high-quality evaluation and training datasets to improve how Large Language Models (LLMs) interact with realistic software engineering tasks. You will have the opportunity to work on a diverse range of projects, from helping models traverse complex code bases to building agents that improve model performance.
What Does a Typical Day Look Like?
- Work across multiple different projects to improve LLM performance on code : sample projects
- Leading and delivering end-to-end agent use cases such as home automation agents, coding copilots, or creative design assistants.
- Collaborate with the team to identify edge cases and ambiguities in model behavior.
- Review and compare 3–4 model-generated code responses per task using a structured ranking system.
- Evaluate code diffs for correctness, code quality, style, and efficiency. Provide clear, detailed rationales explaining the reasoning behind each ranking decision.
Required Skills
Several years of software engineering experience, including 2+ years of continuous full-time experience at a top-tier product company (e.g., Google, Stripe, Amazon, Apple, Meta, Netflix, Microsoft, Datadog, Dropbox, Shopify, PayPal, IBM Research).Strong expertise in building full-stack applications and deploying scalable, production-grade software using modern languages and tools.Deep understanding of software architecture, design, development, debugging, and code quality / review assessment.Proven ability to review code diffs and evaluate correctness, maintainability, and efficiency.Excellent oral and written communication skills for clear, structured evaluation rationales.Commitment : flexible engagement, minimum 10 hrs / week, up to 40 hrs / week (partial PST overlap required)Type : Contractor (no medical / paid leave)Duration : 1 month (starting next week; potential extensions based on performance and fit)Rates : $50–$150 / hour, based on experience and skill level.Engagement Details
Seniority level : Mid-Senior levelEmployment type : ContractJob function : Information Technology and EngineeringIndustry : Software Development#J-18808-Ljbffr