You’ll own the core models and prompts that power Gamma. We weave together text, image, and layout generation to automate all the drudgery of building presentations and websites, and we use AI throughout our product. Your job is to elevate quality, evaluate new models, and push the frontier with new features and modalities.
This role is about productizing existing models, not training new ones. You’ll focus on prompting, evaluating, and fine‑tuning foundation models for maximum performance. With over 1 million AI‑generated presentations and 6 million AI images created daily, you’ll work at massive scale. You’ll own our existing LLM and image prompts, build evaluation frameworks to measure quality, and constantly test new frontier models and methods. You’ll also launch new modalities like voice and video, curate datasets for fine‑tuning, and own uptime, latency, and costs.
You’ll succeed here if you’re a tinkerer who loves pushing the limits of foundation models. You need strong software engineering skills in TypeScript and Python, a data‑driven approach to raising AI quality, and experience building and evaluating prompts at scale. If you get excited about mixing prompt engineering with traditional software engineering to unlock new AI capabilities, this is your role.
Our team has a strong in‑office culture and works in person 4–5 days per week in San Francisco. We love working together to stay creative and connected, with flexibility to work from home when focus matters most.
What you’ll do
Own our existing LLM and image prompts, measuring and continuously improving quality at scale
Develop complex prompts for new features using AI JSX, balancing creativity with reliability
Build evaluation frameworks for our prompts and models, monitoring metrics and qualitative feedback to create better test sets
Drive the roadmap based on quality gaps, constantly evaluating new frontier models and methods
Curate datasets for fine‑tuning open source models and launch new modalities like voice and video
Build analytics and tracking systems while owning uptime, latency, and costs across our AI infrastructure
What you’ll bring
Prompt hacker : You’re a tinkerer who loves seeing how far you can push the limits of a foundation model, with experience building and evaluating prompts at scale
Software engineer : Experienced developer comfortable in TypeScript and Python, excited about mixing prompt engineering with traditional software engineering
Data‑driven : You embrace using data to raise the bar of AI quality, with skills in writing evals, designing metrics, and turning qualitative feedback into quantitative measures
Self‑sufficient in gathering and cleaning data to inform prompt improvements and model evaluations
Experience working with modern LLMs, plus image models like Flux and Imagen (Nice to have)
Familiarity with AI tooling like AIJSX for prompting and Braintrust for evaluations (Nice to have)
#J-18808-Ljbffr
Ai Engineer • San Francisco, CA, United States