Position Summary
What you'll do...
Key Responsibilities
AI Evaluation & Transparency
- Manage dashboards connecting capability metrics (ACI) with business metrics (Scorecard) to surface what's working and what isn't.
- Maintain golden datasets for benchmarking model relevance, execution accuracy, personalization, and compliance.
- Partner with AI Tech and Tech Ops to calibrate LLM-as-Judge and ensure automated evaluation remains aligned with human judgment.
Shopper Experience Quality & Audits
Conduct periodic audits of shopper-facing AI experiences (answers, nudges, recommendations, in-shop chat).Define and monitor KPIs for intent recognition, execution accuracy, and personalization precision.Run cross-journey evaluations (Events, Essentials, Discovery, Raptor baskets) to identify friction or divergence patterns.SOP Governance & Continuous Improvement
Own and evolve SOPs for annotation, labeling, and evaluation to reflect changing features and customer contexts.Capture and codify new edge cases (multimodal queries, accessibility, household profiles) into updated labeling guides.Ensure all SOPs align with shopper-first outcomes and compliance (ADA, HIPAA, privacy).Human + AI Divergence Monitoring
Track variance between human labels, automated judgments, and live model outputs.Create structured escalation paths for high-risk failures (irrelevant, unsafe, or biased responses).Quantify divergence trends and drive remediation with Product and ML Engineering.Vendor & Partner Management
Manage external labeling partners, ensuring throughput, accuracy, and compliance SLAs.Deliver training grounded in real Sparky interactions and golden dataset examples.Audit vendor output and hold partners accountable for quality metrics.Closed-Loop Feedback & Model Integration
Operate the feedback flywheel linking :Shopper signals (thumbs up / down, drop-offs, basket edits) ?Human review & annotation ?Automated LLM evaluation ?Product & ML updates.Ensure insights translate into measurable improvement in personalization, nudge effectiveness, and conversational trust.Qualifications
24 years in AI operations, ML evaluation, or eCommerce data quality; experience managing annotation or human-in-the-loop programs preferred.Working knowledge of AI evaluation frameworks (LLM-as-Judge, golden datasets, A / B testing).Skilled in KPI design, audit processes, and vendor oversight.Analytical storyteller able to link model metrics to business outcomes.Obsessed with customer trust, transparency, and continuous improvement.Minimum Qualifications
Bachelor's degree in information technology, computer science, or related area and 5 years' experience in eCommerce merchandising, site operations, business management, or related area OR 7 years' experience in eCommerce merchandising, site operations, business management, information technology, computer science or related area. 2 years' supervisory experience.
Preferred Qualifications
ECommerce Merchandising, Site Operations, Business Management, or related area, Master's degree in Information Technology, Computer Science, or related area.
Primary Location
702 Sw 8Th St, Bentonville, AR 72716, United States of America
Walmart and its subsidiaries are committed to maintaining a drug-free workplace and has a no tolerance policy regarding the use of illegal drugs and alcohol on the job. This policy applies to all employees and aims to create a safe and productive work environment.