# Job Description : AI Task Evaluation & Statistical Analysis Specialist
## Role Overview We're seeking a data-driven analyst to conduct comprehensive failure analysis on AI agent performance across finance-sector tasks. You'll identify patterns, root causes, and systemic issues in our evaluation framework by analyzing task performance across multiple dimensions (task types, file types, criteria, etc.). ## Key Responsibilities -
- Statistical Failure Analysis
- : Identify patterns in AI agent failures across task components (prompts, rubrics, templates, file types, tags) -
- Root Cause Analysis
- : Determine whether failures stem from task design, rubric clarity, file complexity, or agent limitations -
- Dimension Analysis
- : Analyze performance variations across finance sub-domains, file types, and task categories -
- Reporting & Visualization
- : Create dashboards and reports highlighting failure clusters, edge cases, and improvement opportunities -
- Quality Framework
- : Recommend improvements to task design, rubric structure, and evaluation criteria based on statistical findings -
- Stakeholder Communication
- : Present insights to data labeling experts and technical teams ## Required Qualifications -
- Statistical Expertise
- : Strong foundation in statistical analysis, hypothesis testing, and pattern recognition -
- Programming
- : Proficiency in Python (pandas, scipy, matplotlib / seaborn) or R for data analysis -
- Data Analysis
- : Experience with exploratory data analysis and creating actionable insights from complex datasets -
- AI / ML Familiarity
- : Understanding of LLM evaluation methods and quality metrics -
- Tools
- : Comfortable working with Excel, data visualization tools (Tableau / Looker), and SQL ## Preferred Qualifications - Experience with AI / ML model evaluation or quality assurance - Background in finance or willingness to learn finance domain concepts - Experience with multi-dimensional failure analysis - Familiarity with benchmark datasets and evaluation frameworks - 2-4 years of relevant experience