Machine Learning Engineer - Model Performance

InferenceSan Francisco, CA, United States

30+ days ago

Job type

Full-time

Job description

Inference.net is seeking a Machine Learning Engineer to join our team, focusing on optimizing the performance of our cutting-edge AI inference systems. This role involves working with state-of-the-art large language models and ensuring they run efficiently and effectively at scale. You will be responsible for deploying state-of-the-art models at scale and performing optimizations to increase throughput and enable new features. This position offers the chance to collaborate closely with our engineering team and make significant contributions to open source projects, like SGLang and vLLM.

About Inference.net

We are building a distributed LLM inference network that combines idle GPU capacity from around the world into a single cohesive plane of compute that can be used for running large-language models like DeepSeek and Llama 4. At any given moment, we have over 5,000 GPUs and hundreds of terabytes of VRAM connected to the network.

We are a small, well-funded team working on difficult, high-impact problems at the intersection of AI and distributed systems. We primarily work in-person from our office in downtown San Francisco. Our investors include A16z CSX and Multicoin. We are high-agency, adaptable, and collaborative. We value creativity alongside technical prowess and humility. We work hard, and deeply enjoy the work that we do.

Responsibilities

Design and implement optimization techniques to increase model throughput and reduce latency across our suite of models

Deploy and maintain large language models at scale in production environments

Deploy new models as they are released by frontier labs

Implement techniques like quantization, speculative decoding, and KV cache reuse

Contribute regularly to open source projects such as SGLang and vLLM

Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vLLM, SGLang, CUDA, and other libraries to debug ML performance issues

Collaborate with the engineering team to bring new features and capabilities to our inference platform

Develop robust and scalable infrastructure for AI model serving

Create and maintain technical documentation for inference systems

Requirements

3+ years of experience writing high-performance, production-quality code

Strong proficiency with Python and deep learning frameworks, particularly PyTorch

Demonstrated experience with LLM inference optimization techniques

Hands-on experience with SGLang and vLLM, with contributions to these projects strongly preferred

Familiarity with Docker and Kubernetes for containerized deployments

Experience with CUDA programming and GPU optimization

Strong understanding of distributed systems and scalability challenges

Proven track record of optimizing AI models for production environments

Nice to Have

Familiarity with TensorRT and TensorRT-LLM

Knowledge of vision models and multimodal AI systems

Experience implementing techniques like quantization and speculative decoding

Contributions to open source machine learning projects

Experience with large-scale distributed computing

Compensation

We offer competitive compensation, equity in a high-growth startup, and comprehensive benefits. The base salary range for this role is $180,000 - $250,000, plus competitive equity and benefits including :

Full healthcare coverage

Quarterly offsites

Flexible PTO

Equal Opportunity

Inference.net is an equal opportunity employer. We welcome applicants from all backgrounds and don't discriminate based on race, color, religion, gender, sexual orientation, national origin, genetics, disability, age, or veteran status.

If you're passionate about building the next generation of high-performance systems that push the boundaries of what's possible with large language models, we want to hear from you!

#J-18808-Ljbffr

Create a job alert for this search

Machine Learning Engineer • San Francisco, CA, United States

Related jobs

Promoted

AIML - Machine Learning Engineer, Foundation Model Services

Apple Inc.Santa Clara, CA, United States

Full-time

Work closely with product teams to build production grade solutions to launch models serving millions of customers in real time. Work along side Foundation Model Research team to prototype and devel...Show moreLast updated: 13 days ago

Promoted

Machine Learning Engineer - GenAI, LLM, Agentic AI

CerebrasSanta Clara, CA, United States

Full-time

We are building the next generation of our AI-powered talent platform, aiming to match the right career for everyone in the world. Our AI-native enterprise talent intelligence platform leverages Gen...Show moreLast updated: 20 days ago

Promoted

Machine Learning Engineer

Robotics Technologies LLCSunnyvale, CA, United States

Permanent

Understands and translates business and functional needs into machine learning problem statements.Translates complex machine learning problem statements into specific deliverables and requirements....Show moreLast updated: 28 days ago

Promoted

Machine Learning Engineer, GenAI & LLM - AiDP - IS&T

Apple Inc.Sunnyvale, CA, United States

Full-time

Machine Learning Engineer, GenAI & LLM - AiDP - IS&T.Sunnyvale, California, United States Corporate Functions.As a pivotal member of Apple’s enterprise generative AI efforts, you will : - Innovate tr...Show moreLast updated: 30+ days ago

Promoted

Senior Machine Learning Engineer

HarnhamFremont, CA, US

Full-time

STAFF MACHINE LEARNING ENGINEER.Hybrid – Bay Area (3 Days / Week Onsite).We’re a fast-growing online marketplace backed by a major global tech player. Our platform helps millions of people...Show moreLast updated: 28 days ago

Promoted

Machine Learning Engineer

Metric BioSanta Clara, CA, US

Full-time

Metric Bio is recruiting on behalf of a San Francisco–based digital health company that is building an AI-powered platform to transform patient care and healthcare delivery.ML techniques to s...Show moreLast updated: 1 day ago

Promoted

Staff Machine Learning Performance Engineer

Apple Inc.Cupertino, CA, United States

Full-time

Staff Machine Learning Performance Engineer.Cupertino, California, United States Machine Learning and AI.As a Machine Learning Performance Engineer, you will play a critical role in ensuring the ef...Show moreLast updated: 30+ days ago

Promoted

Machine Learning Engineer

Mercor, Inc.San Francisco, CA, United States

Full-time

Mercor is training models that predict how well someone will perform on a job better than a human can.We use our platform to source, vet, and onboard expert contractors who help train AI models in ...Show moreLast updated: 13 days ago

Promoted

Machine Learning Engineer

QuantcastSan Francisco, CA, United States

Full-time

At Quantcast, we're redefining what's possible in digital advertising.As a global Demand Side Platform (DSP) powered by AI, we help marketers connect with the right audiences and deliver measurable...Show moreLast updated: 30+ days ago

Promoted

Sr. Machine Learning Engineer, Charging Data Modeling

Tesla Motors, Inc.Palo Alto, CA, United States

Full-time

We are the charging-data-modeling team that uses data analytics and machine learning to bridge the engineering, service, deployment and operation of Tesla's charging infrastructure and to enhance t...Show moreLast updated: 4 days ago

Promoted

AIML - Machine Learning Engineer, Foundation Models

Apple Inc.Cupertino, CA, United States

Full-time

AIML - Machine Learning Engineer, Foundation Models – Cupertino, California, United States.We are a group of engineers and researchers responsible for building foundation models at Apple.We build i...Show moreLast updated: 30+ days ago

Promoted

Machine Learning Engineer

Krea.ai, Inc.San Francisco, CA, United States

Full-time

At Krea, we are building next-generation AI creative tools.We are dedicated to making AI intuitive and controllable for creatives. Our mission is to build tools that empower human creativity, not re...Show moreLast updated: 30+ days ago

Promoted

Machine Learning Engineer

Bland.ai, Inc.San Francisco, CA, United States

Full-time

Based out of San Francisco, we're a quickly growing team striving to change the way customers interact with businesses.We've raised $65 million from Silicon Valley's finest; Including Emergence Cap...Show moreLast updated: 30+ days ago

Promoted

Machine Learning Engineer

HiveSan Francisco, CA, United States

Full-time

Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...Show moreLast updated: 30+ days ago

Promoted

Machine Learning Engineer, Simulation Scenario Generation

ZooxFoster City, CA, United States

Full-time

Do you enjoy applying machine learning to complex, real-world problems in autonomous vehicle testing? The Simulation Scenario Authoring team owns the formats and tools used to create synthetic simu...Show moreLast updated: 2 days ago

Promoted

Founding Machine Learning Engineer

NomadicML Inc.San Francisco, CA, United States

Full-time

Harvard, where they both did research in the intersection of computation and evaluations.Between them, they have authored multiple published papers in the machine learning domain and hold numerous ...Show moreLast updated: 30+ days ago

Promoted

Engineer, Machine Learning Accelerator (MLA) Modeling (AI2432)

SiMa.aiSan Jose, CA, United States

Full-time

Engineer, Machine Learning Accelerator (MLA) Modeling (AI2432) at SiMa.This position is on-site in San Jose, CA.Partner with the Architecture team to develop and maintain the performance and power ...Show moreLast updated: 4 days ago

Promoted

Machine Learning Engineer

SylogicSan Jose, CA, United States

Full-time

At Sylogic, we're on a mission to revolutionize infrastructure automation through artificial intelligence.We are a stealth startup headquartered in Silicon Valley and funded by Tier 1 VCs.We’re a p...Show moreLast updated: 30+ days ago