Machine Learning Engineer - Model Performance

InferenceSan Francisco, CA, United States

30+ days ago

Job type

Full-time

Job description

Inference.net is seeking a Machine Learning Engineer to join our team, focusing on optimizing the performance of our cutting-edge AI inference systems. This role involves working with state-of-the-art large language models and ensuring they run efficiently and effectively at scale. You will be responsible for deploying state-of-the-art models at scale and performing optimizations to increase throughput and enable new features. This position offers the chance to collaborate closely with our engineering team and make significant contributions to open source projects, like SGLang and vLLM.

About Inference.net

We are building a distributed LLM inference network that combines idle GPU capacity from around the world into a single cohesive plane of compute that can be used for running large-language models like DeepSeek and Llama 4. At any given moment, we have over 5,000 GPUs and hundreds of terabytes of VRAM connected to the network.

We are a small, well-funded team working on difficult, high-impact problems at the intersection of AI and distributed systems. We primarily work in-person from our office in downtown San Francisco. Our investors include A16z CSX and Multicoin. We are high-agency, adaptable, and collaborative. We value creativity alongside technical prowess and humility. We work hard, and deeply enjoy the work that we do.

Responsibilities

Design and implement optimization techniques to increase model throughput and reduce latency across our suite of models

Deploy and maintain large language models at scale in production environments

Deploy new models as they are released by frontier labs

Implement techniques like quantization, speculative decoding, and KV cache reuse

Contribute regularly to open source projects such as SGLang and vLLM

Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vLLM, SGLang, CUDA, and other libraries to debug ML performance issues

Collaborate with the engineering team to bring new features and capabilities to our inference platform

Develop robust and scalable infrastructure for AI model serving

Create and maintain technical documentation for inference systems

Requirements

3+ years of experience writing high-performance, production-quality code

Strong proficiency with Python and deep learning frameworks, particularly PyTorch

Demonstrated experience with LLM inference optimization techniques

Hands-on experience with SGLang and vLLM, with contributions to these projects strongly preferred

Familiarity with Docker and Kubernetes for containerized deployments

Experience with CUDA programming and GPU optimization

Strong understanding of distributed systems and scalability challenges

Proven track record of optimizing AI models for production environments

Nice to Have

Familiarity with TensorRT and TensorRT-LLM

Knowledge of vision models and multimodal AI systems

Experience implementing techniques like quantization and speculative decoding

Contributions to open source machine learning projects

Experience with large-scale distributed computing

Compensation

We offer competitive compensation, equity in a high-growth startup, and comprehensive benefits. The base salary range for this role is $180,000 - $250,000, plus competitive equity and benefits including :

Full healthcare coverage

Quarterly offsites

Flexible PTO

Equal Opportunity

Inference.net is an equal opportunity employer. We welcome applicants from all backgrounds and don't discriminate based on race, color, religion, gender, sexual orientation, national origin, genetics, disability, age, or veteran status.

If you're passionate about building the next generation of high-performance systems that push the boundaries of what's possible with large language models, we want to hear from you!

#J-18808-Ljbffr

Create a job alert for this search

Machine Learning Engineer • San Francisco, CA, United States

Related jobs

Promoted

Machine Learning Engineer

Conviva Inc.Foster City, CA, United States

Full-time

Conviva is the first and best place to understand and optimize digital customer experiences.Our Operational Data Platform harnesses full-census, comprehensive client-side telemetry—capturing every ...Show moreLast updated: 30+ days ago

Promoted

Senior AI Research Engineer

VirtualVocationsConcord, California, United States

Full-time

A company is looking for a Senior AI Research Engineer.Key Responsibilities Architect, implement, and optimize large-scale post-training systems and data processing pipelines Lead the developmen...Show moreLast updated: 19 days ago

Promoted

Machine Learning Engineer

TwelveLabsSan Francisco, CA, United States

Full-time

Machine Learning Engineer role at TwelveLabs.This position focuses on ML systems and platform engineering across end-to-end research and engineering workflows, including scaling training, inference...Show moreLast updated: 1 day ago

Promoted

Founding Machine Learning Engineer - NomadicML

Matter IntelligenceSan Francisco, CA, United States

Full-time

Harvard, where they both did research in the intersection of computation and evaluations.Between them, they have authored multiple published papers in the machine learning domain and hold numerous ...Show moreLast updated: 1 day ago

Promoted

Azure Machine Learning Engineer

VirtualVocationsHayward, California, United States

Full-time

A company is looking for an Azure Machine Learning Customer Engineer.Key Responsibilities Lead AI strategy and implementation engagements for Azure Machine Learning and related services Guide cl...Show moreLast updated: 3 days ago

Promoted

Machine Learning Security Researcher

VirtualVocationsHayward, California, United States

Full-time

A company is looking for a Machine Learning Security Researcher to conduct security research on machine learning systems. Key Responsibilities Conduct original security research on machine learnin...Show moreLast updated: 1 day ago

Promoted

ML Ops Engineer

VirtualVocationsHayward, California, United States

Full-time

A company is looking for an ML Ops Engineer.Key Responsibilities Develop machine learning algorithms, models, and data pipelines for digital and linear advertising Deploy ML models using AWS ser...Show moreLast updated: 1 day ago

Promoted

Founding Machine Learning Engineer

Edison Smart®San Francisco, CA, United States

Full-time

Direct message the job poster from Edison Smart.Principal Consultant - Smart Health / Wearables / Bio-tech.An ambitious AI-driven biotech startup is seeking a Founding ML Engineer to build the core mac...Show moreLast updated: 1 day ago

Promoted
New!

Machine Learning Engineer

VenmoSan Jose, CA, United States

Full-time

Machine Learning Engineer role at Venmo.We’re looking for a seasoned ML Engineer to join the Data Science team at Venmo—someone who thrives on transforming complex challenges into innovative, data-...Show moreLast updated: 8 hours ago

Promoted

AI / ML Consultant

VirtualVocationsFremont, California, United States

Full-time

A company is looking for an AI / ML Consultant with a focus on Red Hat and virtualization technologies.Key Responsibilities : Assess client needs and design AI / ML strategies aligned with business go...Show moreLast updated: 5 days ago

Promoted

Staff Machine Learning Performance Engineer

Apple Inc.Cupertino, CA, United States

Full-time

Staff Machine Learning Performance Engineer.Cupertino, California, United States Machine Learning and AI.As a Machine Learning Performance Engineer, you will play a critical role in ensuring the ef...Show moreLast updated: 1 day ago

Promoted

Machine Learning Engineer

AI FundSan Francisco, CA, United States

Full-time

Machine Learning Engineer role at AI Fund in San Francisco, CA.This is a full-time, hybrid position.LandingAI context (from original description) : LandingAI has always been a Visual AI company solv...Show moreLast updated: 1 day ago

Promoted

Machine Learning Engineer

PayPalSan Jose, CA, United States

Full-time

Company overview and role context are included below for candidates.This job will assist in designing, developing, and implementing machine learning models and algorithms to solve complex problems....Show moreLast updated: 1 day ago

Promoted

Senior Machine Learning Engineer

VirtualVocationsHayward, California, United States

Full-time

A company is looking for a Senior Applied Machine Learning Engineer - Search & Recommendations.Key Responsibilities Analyze large-scale structured and unstructured data to enhance model performan...Show moreLast updated: 30+ days ago

Promoted

Machine Learning Engineer

VirtualVocationsSanta Clara, California, United States

Full-time

A company is looking for a Machine Learning Engineer in South San Francisco, CA.Key Responsibilities Manage projects deploying machine learning techniques for molecular optimization in drug desig...Show moreLast updated: 30+ days ago

Promoted

Senior Deep Learning Engineer

VirtualVocationsConcord, California, United States

Full-time

A company is looking for a Senior Deep Learning Software Engineer - Autonomous Vehicles.Key Responsibilities Train, fine-tune, optimize, and customize perception DNNs in low precision (FP16 / INT8)...Show moreLast updated: 30+ days ago

Promoted

Machine Learning Engineer

Alexander ChapmanSan Francisco, CA, United States

Full-time

We’re hiring someone who’s driven by data, innovation, and building ML systems that power real impact.If you love designing models, doing research, and seeing your work in production, this is the o...Show moreLast updated: 1 day ago

Promoted

Machine Learning Engineer, Model Optimization & Deployment, Optimus

Tesla Motors, Inc.Palo Alto, CA, United States

Full-time

Tesla is building robust, real-world AI through humanoid robots.As a Software Engineer for the Optimus team, you will build the tools and infrastructure to make and measure improvements to neural n...Show moreLast updated: 30+ days ago