Overview
Machine Learning Engineer — Post-Training, Evaluation & Continuous Improvement
Location : Bay Area preferred (remote-first; ~10% in-office for workshops / on-sites)
Employment : Full-time (new grads welcome)
- Internship / Co-op options available
About AlphaX
AlphaX builds financial reasoning models and agent workflows for professional investment research. Our stack spans RLHF / DPO post-training, reasoning workflow generation, agent tooling & model routing, and a financial data lake (prices, filings, transcripts, news).
The Role
Own the complete post-training life cycle for AlphaX’s financial reasoning model—from data curation through human-in-the-loop feedback, evaluation, regression gating, and continuous improvement. You’ll stand up and maintain the training / eval pipelines that turn research ideas into measurable production gains across our analyst-style task suite (e.g., earnings analysis, risk scoring, forecast memos).
What You’ll Do
Post-training & alignmentRun and refine SFT and preference-based training (RLHF, DPO, KTO or similar) on finance-specific datasets.Train and version reward models and rubric-based scorers for reasoning quality, factuality, and safety.Build ingestion & cleaning for filings, transcripts, market data, and analyst workflows; dedup, redact, and split with leakage controls.Operate human-in-the-loop loops (experts, students, crowd) with clear rubrics and QA.Design a multi-layer eval harness : unit tests for tools / prompts, scenario suites for research tasks, red-team probes, latency / cost tracking.Implement automated A / B and canary gating with statistically sound decision rules and regression alerts.Instrument chain-of-thought-free scoring proxies, tool-use success rates, and multi-step task completion.Tune prompt policies, tool-calling strategies, and model routing (OpenAI / Claude / Gemini, etc.) behind a consistent interface.Ship pipelines on GPUs, schedule jobs, track experiments, and maintain reproducible artifacts and datasets.Add observability for drift, outliers, PII / financial compliance checks.Close the loop : mine failures from production, generate counter-examples, synthesize new training / eval data, and re-train with tight feedback cycles.Qualifications
BS / MS (or rising senior) in CS / EE / Math or equivalent with hands-on experience shipping post-training pipelines.Practical experience with one or more : RLHF / DPO / KTO, reward modeling, or structured preference data.Experience building training and evaluation pipelines end-to-end (data → train → eval → release), including experiment tracking (e.g., MLflow / W&B) and artifact / version control (e.g., DVC, Git-LFS).Comfort reading financial text and reasoning about factuality & compliance (you don’t need to be an investor, just curious and precise).Tech You’ll Touch
Python, PyTorch, Ray, MLflow / W&B, Airflow / Prefect, GPUs, Postgres / BigQuery, vector DBs, Docker / K8s, and major model APIs (OpenAI / Claude / Gemini).
Work Setup
Remote-first with ~10% in-office (Bay Area) for collaboration sprints, model & agent jam sessions, and eval workshops.
#J-18808-Ljbffr