Talent.com
Microsoft
Member of Technical Staff, LLM EvaluationMicrosoft • Boulder, Colorado, United States
Member of Technical Staff, LLM Evaluation

Member of Technical Staff, LLM Evaluation

Microsoft • Boulder, Colorado, United States
30+ days ago
Salary
$188,000.00 yearly
Job type
  • Full-time
Job description

Overview


As a Member of Technical Staff, LLM Evaluation, you will develop and implement cutting-edge methodologies to help us evaluate how well Copilot performs in real-world usage scenarios. Users turn to Copilot for all types of endeavors, making it critical that we ensure our AI systems effectively help them meet their needs. Our vision for meeting user needs is expansive and includes not only task completion, but also affective aspects of the experience. You will be responsible for developing new methods to evaluate LLMs, train classifiers, experimenting with data collection techniques, and implementing methodologies to provide real-time signals on Copilot performance. We're looking for outstanding individuals with experience in the social sciences, machine learning, and analysis of natural language. The right candidate is a creative problem solver who will work closely with user researchers and product leaders to build automated evaluation frameworks that help us drive improvements in Copilot.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles () or 25 miles (, country-specific) of that location. This expectation is subject to local law and may vary by jurisdiction.
Responsibilities
  • Leverage expertise to measure the performance of Copilot, identify failure modes and novel mitigation strategies, including data mining, prompt engineering, LLM as a judge, and classifier training.
  • Creative problem solving, navigating complexity with clarity, independently shaping direction and delivering results even when the path isn’t obvious.
  • Create and implement comprehensive evaluation frameworks across diverse scenarios, edge cases, and potential failure modes.
  • Build automated testing systems, generalize solutions into repeatable frameworks, and write efficient code for model pipelines and intervention systems.
  • Maintain a user-oriented perspective by understanding needs from user perspectives, validating approaches through user research, and serving as a trusted advisor on AI matters
  • Track advances in research, identify relevant state-of-the-art techniques, and adapt algorithms to drive innovation in production systems serving millions of users.

Qualifications
Required Qualifications
  • Bachelor’s Degree in Computer Science, Statistics, Economics, Psychology, Linguistics or related technical discipline AND 4 years technical engineering experience with coding in languages including Python and SQL.
  • Experience prompting and working with large language models.
  • Experience writing production-quality Python code.

Preferred Qualifications
  • Demonstrated interest in Responsible AI.

Data Science IC4 - The typical base pay range for this role across the $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year.
Data Science IC5 - The typical base pay range for this role across the $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year. Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: Microsoft will accept applications and processes offers for these roles on an ongoing basis.Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.Industry leading healthcareEducational resourcesDiscounts on products and servicesSavings and investmentsMaternity and paternity leaveGenerous time awayGiving programsOpportunities to network and connect
Create a job alert for this search

Member of Technical Staff, LLM Evaluation • Boulder, Colorado, United States

Similar jobs

CDL Trainee Entry Level Home Daily, Weekly, or Every 2 weeks

SkillConnect LLCBOULDER, CO, USA
Full-time

Recent CDL-A Grads – Start Your Trucking Career with Paid Training! .CDL Trainee Entry Level Home Daily, Weekly, or Every 2 weeks.Just got your CDL-A, We’ve got the perfect opportunity to launch yo... Show more

 • Promoted

Trainee Entry Level Home Daily, Weekly, and Every 2 weeks

SkillConnect LLCBOULDER, CO, USA
Full-time

Recent CDL-A Grads – Start Your Trucking Career with Paid Training! .Just got your CDL-A, We’ve got the perfect opportunity to launch your career!.We’re looking for motivated, safety-minded recent ... Show more

 • Promoted

Manager in Training

State and Liberty Clothing Co.Boulder, CO, United States
Full-time

MANAGER IN TRAINING PROGRAM (OVERVIEW):.If you seeking a lifestyle or career change and are enthusiastically ready to embark on a new journey in the retail industry, we have an exciting opportunity... Show more

 • Promoted

Candidate Licensed Mental Health Therapist (LPCC,LSW,MFTC, SWC) - Lyons, CO

LifeStance HealthLyons, Colorado, United States
Full-time

At LifeStance Health, we believe in a truly healthy society where mental and physical healthcare are unified to make lives better.Our mission is to help people lead healthier, more fulfilling lives... Show more

 • Promoted

Account Manager - State Farm Agent Team Member

Christian Moore - State Farm AgentMorrison, CO, United States
Full-time

State Farm Independent Contractor Agent.Are you outgoing and customer-focused? Do you enjoy working with the public? If you answered yes to these questions, working for a State Farm independent con... Show more

 • Promoted

Account Representative - State Farm Agent Team Member

Clayton Allison - State Farm AgentGolden, CO, United States
Full-time

This busy insurance and financial services office has a passion to make a difference in the lives of others and better the community.We are a growing agency with big dreams and lots of potential.We... Show more

 • Promoted

Wealth Advisor (Colorado)

Mercer AdvisorsBoulder, CO, United States
Full-time

Boulder; Colorado Springs; Denver (Greenwood Village); Denver (HQ); Fort Collins.For more than 40 years, Mercer Advisors has been helping families amplify and simplify their financial lives by inte... Show more

 • Promoted

Associate, Opportunity & Engagement Set-Up

KPMGBoulder, Colorado, United States
Full-time

Known for being a great place to work and build a career, KPMG provides audit, tax and advisory services for organizations in today's most important industries.Our growth is driven by delivering re... Show more

 • Promoted • New!

Pharmacy Technician - Entry Level Training Program

DreamboundBoulder, Colorado, United States
Full-time

Note : This is an educational program, not a job.Successful completion of the program does not guarantee employment but will equip you with valuable skills for the healthcare job market.Looking to ... Show more

 • Promoted

Information Technology Professional

US NavyGolden, CO, US
Full-time

Information Technology Professional (IT/CTN/IS).Information Systems Technicians, Cryptologic Technician Networks, and Intelligence Specialists keep the Fleet connected, informed, and secure by oper... Show more

 • Promoted

FD/Tech

Health & Rehab SolutionsBoulder, CO, United States
Full-time

Location 2750 Broadway St, Boulder, CO, 80304, United States.Employee Type Full Time Non-Exempt.Description Requirements Summary.Description Requirements Summary. Show more

 • Promoted

Technical Program Manager III, Solutions Delivery, Google Cloud

GoogleBoulder, CO, United States
Full-time

The application window will be open until at least May 27, 2026.This opportunity will remain online based on business needs which may be before or after the specified date.In accordance with Washin... Show more

 • Promoted

Learning Services Engineer - EBS - Remote

SiemensBoulder, CO, United States
Remote
Full-time

Learning Services Engineer - EBS - Remote.We are a leading global software company dedicated to the world of computer aided design, 3D modeling and simulation helping innovative global manufacturer... Show more

 • Promoted

Software Engineering Technical Leader - Observability Platforms (Remote orHybrid)

Cisco Systems, Inc.Boulder, CO, US
Remote
Full-time

The application window is expected to close on: 06/26/2026.Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received.This position is fully... Show more

 • Promoted

Lead Engineer I, Technical Product Management

KPMGBoulder, Colorado, United States
Full-time

Known for being a great place to work and build a career, KPMG provides audit, tax and advisory services for organizations in today's most important industries.Our growth is driven by delivering re... Show more

 • Promoted • New!

Patient Care Technician - Entry Level Training Program

DreamboundBoulder, Colorado, United States
Full-time

Note : This is an educational program, not a job.Successful completion of the program does not guarantee employment but will equip you with valuable skills for the healthcare job market.Looking to ... Show more

 • Promoted

Work At Home Online Entry Level - Remote Focus Group Panelist

Apex Focus Group LLCGolden, Colorado, United States
Remote
Part-time

We're currently looking for individuals across the country to take part in remote paid research, including remote focus groups, product trials, and consumer studies.Earn up to $750 a week in your f... Show more

 • Promoted

Regional Property Claims Evaluator (FORT COLLINS)

USAADrake, Colorado, United States
Full-time

At USAA, our mission is to empower our members to achieve financial security through highly competitive products, exceptional service and trusted advice.We seek to be the #1 choice for the military... Show more

 • Promoted

Remote Macroeconomic Modeling Specialist (EViews)

Micro1Evergreen, Colorado, US
$35.00 hourly
Remote
Full-time

Macroeconomic Modeling Specialist (EViews).Real-world expertise is turned into training data, evaluations, and feedback loops that improve how models perform.AI labs and enterprises use micro1 to t... Show more

 • Promoted

Account Manager - State Farm Agent Team Member

Paul Walden - State Farm AgentNiwot, CO, United States
Full-time

Job DescriptionJob Description.Are you the right applicant for this opportunity Find out by reading through the role overview below.As an Account Manager - State Farm Agent Team Member for Paul Wal... Show more