- Search jobs
- Berkeley, CA
- performance engineer
Performance engineer Jobs in Berkeley, CA
Create a job alert for this search
Performance engineer • berkeley ca
Machine Learning Engineer - Model Performance
InferenceSan Francisco, California, United StatesSenior AI Performance Engineer
GenmoSan Francisco, California, United States- Promoted
Senior Propulsion and Aircraft Performance Engineer
PykaAlameda, CA, US- Promoted
AI Agent Software Engineer - Agent Performance Engineering
AssembledSan Francisco, CA, US- Promoted
Performance Engineer
AnthropicSan Francisco, CA, USSenior HPC Performance Engineer
NVIDIARemote, CA, USSr Building Performance Engineer
HGASan Francisco, CA, US- Promoted
Principal AI Performance Engineer
Epoch BiodesignSan Francisco, CA, United States- Promoted
Performance Analyst
CallanSan Francisco, CA, US- Promoted
Principal AI Performance Engineer
CrusoeSan Francisco, CA, USQE Lead Performance Engineer
US012 Marsh & McLennan Agency LLCCalifornia,San Francisco- Promoted
- New!
Lead Performance Tester / Engineer
Diverse LynxSan Francisco, CA, US- Promoted
Lead CPU Performance Engineer
VirtualVocationsOakland, California, United States- Promoted
Machine Learning Engineer - Model Performance
SOLANA FOUNDATIONSan Francisco, CA, United States- Promoted
Sr. Performance Engineer San Francisco, California
Databricks Inc.San Francisco, CA, USSr. Software Engineer - Performance
DatabricksSan Francisco, California- Promoted
Performance engineer
WriterSan Francisco, CA, US- Promoted
Performance Engineer
OpenAISan Francisco, CA, US- Promoted
Performance engineer
writer.comSan Francisco, CA, US- Promoted
performance Engineer / tester
Omega Solutions Inc.San Francisco, CA, USThe average salary range is between $ 129,875 and $ 173,510 year , with the average salary hovering around $ 153,425 year .
- software development manager (from $ 220,000 to $ 273,000 year)
- nuclear medicine (from $ 153,470 to $ 250,984 year)
- veterinarian (from $ 115,000 to $ 250,000 year)
- python developer (from $ 135,000 to $ 244,125 year)
- office administrative assistant (from $ 47,840 to $ 243,900 year)
- vp of engineering (from $ 68,428 to $ 237,500 year)
- embedded systems engineer (from $ 133,875 to $ 222,134 year)
- product director (from $ 157,500 to $ 220,750 year)
- applications engineer (from $ 149,709 to $ 218,500 year)
- startup (from $ 136,250 to $ 216,250 year)
- Anaheim, CA (from $ 77,756 to $ 240,760 year)
- Durham, NC (from $ 107,250 to $ 233,490 year)
- Santa Clara, CA (from $ 140,000 to $ 214,000 year)
- San Mateo, CA (from $ 125,863 to $ 212,250 year)
- San Jose, CA (from $ 125,358 to $ 211,691 year)
- Baltimore, MD (from $ 145,000 to $ 210,100 year)
- Palm Bay, FL (from $ 149,550 to $ 208,490 year)
- Des Moines, IA (from $ 102,600 to $ 208,000 year)
- San Francisco, CA (from $ 121,800 to $ 205,272 year)
- Sunnyvale, CA (from $ 124,500 to $ 205,110 year)
The average salary range is between $ 106,478 and $ 185,900 year , with the average salary hovering around $ 130,004 year .
Related searches
Machine Learning Engineer - Model Performance
InferenceSan Francisco, California, United States- Full-time
Inference.net is seeking a Machine Learning Engineer to join our team, focusing on optimizing the performance of our cutting-edge AI inference systems. This role involves working with state-of-the-art large language models and ensuring they run efficiently and effectively at scale. You will be responsible for deploying state-of-the-art models at scale and performing optimizations to increase throughput and enable new features. This position offers the chance to collaborate closely with our engineering team and make significant contributions to open source projects, like SGLang and vLLM.
About Inference.net
We are building a distributed LLM inference network that combines idle GPU capacity from around the world into a single cohesive plane of compute that can be used for running large-language models like DeepSeek and Llama 4. At any given moment, we have over 5,000 GPUs and hundreds of terabytes of VRAM connected to the network.
We are a small, well-funded team working on difficult, high-impact problems at the intersection of AI and distributed systems. We primarily work in-person from our office in downtown San Francisco. Our investors include A16z CSX and Multicoin. We are high-agency, adaptable, and collaborative. We value creativity alongside technical prowess and humility. We work hard, and deeply enjoy the work that we do.
Responsibilities
Design and implement optimization techniques to increase model throughput and reduce latency across our suite of models
Deploy and maintain large language models at scale in production environments
Deploy new models as they are released by frontier labs
Implement techniques like quantization, speculative decoding, and KV cache reuse
Contribute regularly to open source projects such as SGLang and vLLM
Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vLLM, SGLang, CUDA, and other libraries to debug ML performance issues
Collaborate with the engineering team to bring new features and capabilities to our inference platform
Develop robust and scalable infrastructure for AI model serving
Create and maintain technical documentation for inference systems
Requirements
3+ years of experience writing high-performance, production-quality code
Strong proficiency with Python and deep learning frameworks, particularly PyTorch
Demonstrated experience with LLM inference optimization techniques
Hands-on experience with SGLang and vLLM, with contributions to these projects strongly preferred
Familiarity with Docker and Kubernetes for containerized deployments
Experience with CUDA programming and GPU optimization
Strong understanding of distributed systems and scalability challenges
Proven track record of optimizing AI models for production environments
Nice to Have
Familiarity with TensorRT and TensorRT-LLM
Knowledge of vision models and multimodal AI systems
Experience implementing techniques like quantization and speculative decoding
Contributions to open source machine learning projects
Experience with large-scale distributed computing
Compensation
We offer competitive compensation, equity in a high-growth startup, and comprehensive benefits. The base salary range for this role is $180,000 - $250,000, plus competitive equity and benefits including :
Full healthcare coverage
Quarterly offsites
Flexible PTO
Equal Opportunity
Inference.net is an equal opportunity employer. We welcome applicants from all backgrounds and don't discriminate based on race, color, religion, gender, sexual orientation, national origin, genetics, disability, age, or veteran status.
If you're passionate about building the next generation of high-performance systems that push the boundaries of what's possible with large language models, we want to hear from you!