Talent.com
Senior Software Engineer Model Performance
Senior Software Engineer Model PerformanceInference • San Francisco, California, USA
Senior Software Engineer Model Performance

Senior Software Engineer Model Performance

Inference • San Francisco, California, USA
4 days ago
Job type
  • Full-time
Job description

Help us make inference blazingly fast. If you love squeezing every last drop of performance out of GPUs diving deep into CUDA kernels and turning optimization techniques into production systems wed love to meet you.

About

trains and hosts specialized language models for companies that need frontier-quality AI at a fraction of the cost. The models we train match GPT-5 accuracy but are smaller faster and up to 90% cheaper. Our platform handles everything end-to-end : distillation training evaluation and planet-scale hosting.

We are a well-funded ten-person team of engineers who work in-person in downtown San Francisco on difficult high-impact engineering problems. Everyone on the team has been writing code for over 10 years and has founded and run their own software companies. We are high-agency adaptable and collaborative. We value creativity alongside technical prowess and humility. We work hard and deeply enjoy the work that we do. Most of us are in the office 4 days a week in SF; hybrid works for Bay Area candidates.

About the Role

You will be responsible for making our inference stack as fast and efficient as possible. Your work spans from implementing known optimization techniques to experimenting with novel approaches always with the goal of serving models faster and cheaper at scale.

Your north star is inference performance : latency throughput cost efficiency and how quickly we can bring new model architectures into production. Youll work across the full inference stackfrom CUDA kernels to serving frameworksto find and eliminate bottlenecks. This role reports directly to the founding team. Youll have the autonomy a large compute budget and technical support to push the limits of whats possible in model serving.

Key Responsibilities

Implement and productionize optimization techniques including quantization speculative decoding KV cache optimization continuous batching and LoRA serving

Deep dive into inference frameworks (vLLM SGLang TensorRT-LLM) and underlying libraries to debug and improve performance

Profile and optimize CUDA kernels and GPU utilization across our serving infrastructure

Add support for new model architectures ensuring they meet our performance standards before going to production

Experiment with novel inference techniques and bring successful approaches into production

Build tooling and benchmarks to measure and track inference performance across our fleet

Collaborate with applied ML engineers to ensure trained models can be served efficiently

Requirements

2 years of experience in ML systems inference optimization or GPU programming

Strong proficiency in Python and familiarity with C

Hands-on experience with LLM inference frameworks (vLLM SGLang TensorRT-LLM or similar)

Deep understanding of GPU architecture and experience profiling GPU workloads

Familiarity with LLM optimization techniques (quantization speculative decoding continuous batching KV cache management)

Experience with PyTorch and understanding of how models execute on hardware

Track record of measurably improving system performance

Nice-to-Have

Experience with CUDA programming

Familiarity with serving non-LLM models (TTS vision embeddings)

Experience with distributed inference and multi-GPU serving

Contributions to open-source inference frameworks

Experience with Docker and Kubernetes

You dont need to tick every box. Curiosity and the ability to learn quickly matter more.

Compensation

We offer competitive compensation equity in a high-growth startup and comprehensive benefits. The base salary range for this role is $220000 - $320000 plus equity and benefits depending on experience.

Equal Opportunity

is an equal opportunity employer. We welcome applicants from all backgrounds and dont discriminate based on race color religion gender sexual orientation national origin genetics disability age or veteran status.

If youre excited about making AI inference faster for everyone wed love to hear from you. Please send your resume and GitHub to and / or apply here on Ashby.

Required Experience :

Senior IC

Key Skills

JProfiler,Splunk,Performance Testing,Fiddler,Apache,HP Performance Center,LoadRunner,New Relic,Scalability,J2EE,Java,Scripting

Employment Type : Full-Time

Department / Functional Area : Engineering

Experience : years

Vacancy : 1

Monthly Salary Salary : 220000 - 320000

Create a job alert for this search

Senior Software Engineer Model Performance • San Francisco, California, USA

Similar jobs
Senior Software Engineer - Compute Performance

Senior Software Engineer - Compute Performance

Lambda • San Francisco, California, United States
Full-time
In 2012, Lambda started with a crew of AI engineers publishing research at top machine-learning conferences.We began as an AI company built by AI engineers. Today, we're on a mission to be the world...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer I

Senior Software Engineer I

Qualia • San Francisco, California, United States
Full-time
At Qualia, we've built the leading B2B real estate technology that transforms the home buying and selling experience into a simple, secure, and enjoyable process. Our SMB and Enterprise products bri...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer

Senior Software Engineer

Alivio Search Partners • Redwood City, California, United States
Full-time
Redwood City, California (On-site 35 days per week).Offers must be made within one week of application.Architect, build, and optimize distributed systems for warehousing, robotics, e-commerce, or l...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer

Senior Software Engineer

Parafin • San Francisco, California, United States
Full-time
At Parafin, we’re on a mission to grow small businesses.Small businesses are the backbone of our economy, but traditional banks often don’t have their backs. We build tech that makes it simple for s...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer (GTM)

Senior Software Engineer (GTM)

Toma • San Francisco, California, United States
Full-time
We're building the AI platform for underserved industries.LLM usage has seen a meteoric rise in the past year, but there is still a significant gap between agentic innovation and its use in the rea...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer

Senior Software Engineer

Gallup • San Francisco, California, United States
Full-time
Build Gallup’s next generation of products.As a senior software engineer at Gallup, you’ll push the boundaries of what our technology can do. You’ll design and deliver full-stack solutions, use rapi...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer, Portal

Senior Software Engineer, Portal

Hayden Ai • San Francisco, California, United States
Full-time
At Hayden AI, we are on a mission to harness the power of computer vision to transform the way transit systems and other government agencies address real-world challenges.From bus lane and bus stop...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer : Forward Deployed

Senior Software Engineer : Forward Deployed

Gridware • San Francisco, California, United States
Full-time
Gridware is a San Francisco-based technology company dedicated to protecting and enhancing the electrical grid.We pioneered a groundbreaking new class of grid management called active grid response...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer, Observability

Senior Software Engineer, Observability

Together Ai • San Francisco, California, United States
Full-time
Together AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle, combining the fastest LLM inference engine with state-of-the-art AI cloud infrastruct...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer, Full Stack

Senior Software Engineer, Full Stack

Sift Stack • San Francisco, California, United States
Full-time
At Sift, we’re redefining how modern machines are built, tested, and operated.Our platform gives engineers real-time observability over high-frequency telemetry—eliminating bottlenecks and enabling...Show more
Last updated: 14 days ago • Promoted
Senior Software Engineer

Senior Software Engineer

Peregrine Technologies • San Francisco, California, United States
Full-time
Backed by leading investors from Silicon Valley, Peregrine supports public safety agencies across the country — from Los Angeles to Louisville to Atlanta — empowering public servants to improve ope...Show more
Last updated: 30+ days ago • Promoted
Senior Firmware EngineerSoftware Engineering • Berkeley, CA • Full time • On-site

Senior Firmware EngineerSoftware Engineering • Berkeley, CA • Full time • On-site

Form Energy • Berkeley, CA, United States
Full-time
Are you ready to build America's energy future? Form Energy is an American manufacturing and energy technology company.We're revolutionizing energy storage with cost-effective, multi-day technology...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer - Fullstack

Senior Software Engineer - Fullstack

Pave • San Francisco, California, United States
Remote
Full-time
At Pave, we're building the industry’s leading compensation platform, combining the world's largest real-time compensation dataset with deep expertise in AI and machine learning.Our platform is per...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer : Fullstack

Senior Software Engineer : Fullstack

Woflow • San Francisco, California, United States
Full-time
Woflow is a technology startup creating products and solutions to support a high-growth, on-demand economy.Our flagship product is an end-to-end platform that allows our customers to request and re...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer

Senior Software Engineer

Astranis • San Francisco, California, United States
Full-time +1
As a team, we’ve launched five satellites into orbit, signed ten commercial deals worth over $1 billion in revenue, raised over $750 million from top global investors, and recruited a team of over ...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer - Openflow

Senior Software Engineer - Openflow

Snowflake • Menlo Park, California, United States
Full-time
Snowflake is about empowering enterprises to achieve their full potential — and people too.With a culture that’s all in on impact, innovation, and collaboration, Snowflake is the sweet spot for bui...Show more
Last updated: 4 days ago • Promoted
Senior Software Engineer

Senior Software Engineer

Probably Genetic • San Francisco, California, United States
Full-time
Probably Genetic is changing the lives of patients living with severe, complex diseases.Our data platform is used by drug developers and patient advocacy groups to develop and launch treatments for...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer

Senior Software Engineer

Shakudo • San Francisco, California, United States
Full-time
At Shakudo, we are building the world’s first operating system for data and AI.We use the term operating system in the truest sense of the word. Like iOS, Windows and Linux, Shakudo’s end-to-end OS ...Show more
Last updated: 30+ days ago • Promoted