Talent.com
Member of Technical Staff, LLM Evaluation
Member of Technical Staff, LLM EvaluationMicrosoft • Boulder, Colorado, United States
Member of Technical Staff, LLM Evaluation

Member of Technical Staff, LLM Evaluation

Microsoft • Boulder, Colorado, United States
26 days ago
Job type
  • Full-time
Job description

Overview

As a Member of Technical Staff, LLM Evaluation, you will develop and implement cutting-edge methodologies to help us evaluate how well Copilot performs in real-world usage scenarios. Users turn to Copilot for all types of endeavors, making it critical that we ensure our AI systems effectively help them meet their needs. Our vision for meeting user needs is expansive and includes not only task completion, but also affective aspects of the experience. You will be responsible for developing new methods to evaluate LLMs, train classifiers, experimenting with data collection techniques, and implementing methodologies to provide real-time signals on Copilot performance. We're looking for outstanding individuals with experience in the social sciences, machine learning, and analysis of natural language. The right candidate is a creative problem solver who will work closely with user researchers and product leaders to build automated evaluation frameworks that help us drive improvements in Copilot.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles () or 25 miles (, country-specific) of that location. This expectation is subject to local law and may vary by jurisdiction.

Responsibilities

  • Leverage expertise to measure the performance of Copilot, identify failure modes and novel mitigation strategies, including data mining, prompt engineering, LLM as a judge, and classifier training.
  • Creative problem solving, navigating complexity with clarity, independently shaping direction and delivering results even when the path isn’t obvious.
  • Create and implement comprehensive evaluation frameworks across diverse scenarios, edge cases, and potential failure modes.
  • Build automated testing systems, generalize solutions into repeatable frameworks, and write efficient code for model pipelines and intervention systems.
  • Maintain a user-oriented perspective by understanding needs from user perspectives, validating approaches through user research, and serving as a trusted advisor on AI matters
  • Track advances in research, identify relevant state-of-the-art techniques, and adapt algorithms to drive innovation in production systems serving millions of users.

Qualifications

Required Qualifications

  • Bachelor’s Degree in Computer Science, Statistics, Economics, Psychology, Linguistics or related technical discipline AND 4 years technical engineering experience with coding in languages including Python and SQL.
  • Experience prompting and working with large language models.
  • Experience writing production-quality Python code.
  • Preferred Qualifications

  • Demonstrated interest in Responsible AI.
  • Data Science IC4 - The typical base pay range for this role across the $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year.

    Data Science IC5 - The typical base pay range for this role across the $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year. Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here : Microsoft will accept applications and processes offers for these roles on an ongoing basis.Benefits / perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.Industry leading healthcareEducational resourcesDiscounts on products and servicesSavings and investmentsMaternity and paternity leaveGenerous time awayGiving programsOpportunities to network and connect

    Create a job alert for this search

    Member Of Technical Staff • Boulder, Colorado, United States

    Related jobs
    Associate Team Leader

    Associate Team Leader

    H&R Block • Golden, CO, US
    Full-time +1
    At H&R Block, we believe in the power of people helping people.Our defining Purpose is to provide help and inspire confidence in our clients, associates, and communities everywhere.We believe in a ...Show more
    Last updated: 30+ days ago • Promoted
    Remote Audio Generalist Evaluator Expert - AI Trainer ($35-$40 per hour)

    Remote Audio Generalist Evaluator Expert - AI Trainer ($35-$40 per hour)

    Mercor • Boulder, Colorado, US
    Remote
    Full-time
    Mercor is seeking detail-oriented writing experts to contribute to a high-impact audio AI research project with a leading lab. Freelancers will author prompt–golden answer pairs that train and evalu...Show more
    Last updated: 5 days ago • Promoted
    Remote Consumer Panel Participant

    Remote Consumer Panel Participant

    Ipsos • Masonville, Colorado, United States
    Remote
    Full-time
    Ipsos iSay, an online research community powered by Ipsos—the world’s third-largest market research company—is looking for enthusiastic. This is your opportunity to influence the future of brands, g...Show more
    Last updated: 8 days ago • Promoted
    Remote Enrollment Producer - Entry Level

    Remote Enrollment Producer - Entry Level

    Global Elite • Boulder, Colorado, United States, 80301
    Remote
    Full-time
    Remote Enrollment Producer - Entry Level.After a record breaking year with $2.If you are hard-working, motivated, and a team player then we have a position for you!. We work closely with members of ...Show more
    Last updated: 30+ days ago
    Utility Engineer

    Utility Engineer

    ABM Industries • Boulder, CO, US
    Full-time
    The pay listed is the hourly range or the hourly rate for this position.A specific offer will vary based on applicant’s experience, skills, abilities, geographic location, and alignment with market...Show more
    Last updated: 30+ days ago
    Virtual Team Lead

    Virtual Team Lead

    American Income Life AO • Boulder, CO, US
    Full-time
    Quick Apply
    APPLICANT MUST RESIDE IN THE U.TO BE CONSIDERED FOR THIS POSITION, ALL OTHER APPLICANTS WILL BE IMMEDIATELY DISQUALIFIED • Are you ready to join the forefront of AO’s unparalleled growth in th...Show more
    Last updated: 30+ days ago
    AI Trainer -Remote Content Reviewer

    AI Trainer -Remote Content Reviewer

    Outlier • Boulder, CO, United States
    Remote
    Full-time
    Earn up to $15 / hour + performance bonuses.Outlier, a platform owned and operated by Scale AI, is looking for.If you're passionate about improving models and excited by the future of AI, this is you...Show more
    Last updated: 7 days ago • Promoted
    Travel Behavioral Health Tech in Golden, CO

    Travel Behavioral Health Tech in Golden, CO

    AlliedTravelCareers • Golden, CO, US
    Full-time
    AlliedTravelCareers is working with Aequor to find a qualified Behavioral Health Tech in Golden, Colorado, 80401!.Registered Behavioral Tech (RBT). With Aequor, you can enjoy the freedom to advance ...Show more
    Last updated: 8 days ago • Promoted
    Technical Enrollment Manager for Online Education

    Technical Enrollment Manager for Online Education

    LanceSoft • Boulder, CO, US
    Full-time
    Technical Enrollment Manager For Online Education.Job Summary : The Technical Enrollment Manager for Online Education leads strategic and technical initiatives to ensure secure, scalable, and compli...Show more
    Last updated: 30+ days ago • Promoted
    Technology Associate

    Technology Associate

    Gpac • Boulder, Colorado, United States
    Full-time
    Quick Apply
    Mid-Level Technology Associate.Location : Denver or Boulder, CO.Practice Area : Technology Transactions / SaaS / Commercial Contracts. Employment Type : Full-Time, On-Site or Hybrid (depending on firm)...Show more
    Last updated: 27 days ago
    Software Technical Lead

    Software Technical Lead

    Infleqtion • Boulder, Colorado, United States
    Full-time
    We are seeking self-motivated, energetic individuals with exceptional problem-solving and technical skills to help drive our. We break down barriers between disciplines, stepping in wherever we can ...Show more
    Last updated: 30+ days ago • Promoted
    Staff - CT Technologist - UCHealth

    Staff - CT Technologist - UCHealth

    UC Health • Jamestown, CO, United States
    Full-time
    UCHealth (Colorado) is seeking a CT Technologist for a job in Louisville, Colorado.Job Description & Requirements Specialty : CT Technologist Discipline : Allied Health Professional Duration : Ongoing...Show more
    Last updated: 9 days ago • Promoted
    Life Enrichment Associate II $20.75 - $22.88 / hour Sunday-Thursday

    Life Enrichment Associate II $20.75 - $22.88 / hour Sunday-Thursday

    Christian Living Communities • Indian Hills, CO, US
    Full-time
    Assists in the planning and implementation of meaningful activities, appropriate to the individual needs and interests of the participants. Helps to provide programs as well as one to one activities...Show more
    Last updated: 1 day ago • Promoted
    CT Tech

    CT Tech

    Intermountain • Golden, CO, United States
    Full-time
    We at Bestica believe our success is a direct result of hard work and outstanding employee dedication.Our environment is dynamic, friendly, and collaborative. We foster a positive culture, where inn...Show more
    Last updated: 1 hour ago • Promoted • New!
    Block Advisor Associate Team Leader

    Block Advisor Associate Team Leader

    H&R Block • Boulder, CO, US
    Full-time +1
    At H&R Block, we believe in the power of people helping people.Our defining Purpose is to provide help and inspire confidence in our clients, associates, and communities everywhere.We also believe ...Show more
    Last updated: 30+ days ago • Promoted
    Consumer Panel Participant

    Consumer Panel Participant

    Ipsos • Masonville, Colorado, United States
    Full-time
    Ipsos iSay, an online research community powered by Ipsos—the world’s third-largest market research company—is looking for enthusiastic. This is your opportunity to influence the future of brands, g...Show more
    Last updated: 8 days ago • Promoted
    Remote Senior Machine Learning Engineer - LLM Evaluation / Task Creations (India Based) - AI Trainer ($21-$21 per hour)

    Remote Senior Machine Learning Engineer - LLM Evaluation / Task Creations (India Based) - AI Trainer ($21-$21 per hour)

    Mercor • Boulder, Colorado, US
    Remote
    Full-time
    Role Description • • Mercor is hiring on behalf of a leading AI research lab to bring on highly skilled • •Machine Learning Engineers • • with a proven record of building, training, and evaluating high-...Show more
    Last updated: 2 days ago • Promoted
    AI Trainer -Remote Content QA Reviewer

    AI Trainer -Remote Content QA Reviewer

    Outlier • Boulder, CO, United States
    Remote
    Full-time
    Earn up to $15 / hour + performance bonuses.Outlier, a platform owned and operated by Scale AI, is looking for.If you're passionate about improving models and excited by the future of AI, this is you...Show more
    Last updated: 7 days ago • Promoted