LLM / ML Engineer (Inference)

ReductoSan Francisco, CA, United States

16 hours ago

Job type

Full-time

Job description

Job Opportunity

We would love to meet you if you :

Philosophy : You are your own worst critic. You have a high bar for quality and don't rest until the job is done rightno settling for 90%. We want someone who ships fast, with high agency, and who doesn't just voice problems but actively jumps in to fix them.
Experience : You have deep expertise in Python and PyTorch, with a strong foundation in low-level operating systems concepts including multi-threading, memory management, networking, storage, performance, and scale. You're experienced with modern inference systems like TGI, vLLM, TensorRT-LLM, and Optimum, and comfortable creating custom tooling for testing and optimization.
Approach : You combine technical expertise with practical problem-solving. You're methodical in debugging complex systems and can rapidly prototype and validate solutions.

The core work will include :

Architecting and implementing robust, scalable inference systems for serving state-of-the-art AI models

Optimizing model serving infrastructure for high throughput and low latency at scale

Developing and integrating advanced inference optimization techniques

Working closely with our research team to bring cutting-edge capabilities into production

Building developer tools and infrastructure to support rapid experimentation and deployment.

Bonus points if you :

Have experience with low-level systems programming (CUDA, Triton) and compiler optimization

Are passionate about open-source contributions and staying current with ML infrastructure developments

Bring practical experience with high-performance computing and distributed systems

Have worked in early-stage environments where you helped shape technical direction

Are energized by solving complex technical challenges in a collaborative environment

This is an in person role at our office in SF. We're an early stage company which means that the role requires working hard and moving quickly. Please only apply if that excites you.

Create a job alert for this search

Engineer • San Francisco, CA, United States

Related jobs

Promoted

Distributed LLM Inference Engineer

Anyscale, IncSan Francisco, CA, United States

Full-time

At Anyscale, we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We're commercializing Ray, a popular open-source project that'...Show moreLast updated: 30+ days ago

Promoted

ML Engineer

Bedrock SecuritySan Francisco, CA, United States

Full-time

Must be willing to relocate to the Bay Area (Menlo Park).Must be legally able to work in the United States.We can sponsor you if you are already in the United States. Bedrock Security seeks an exper...Show moreLast updated: 30+ days ago

Promoted

Machine Learning Engineer, GenAI Applied ML

Scale AI, Inc.San Francisco, CA, United States

Full-time

At Scale AI, our mission is to accelerate the development of AI applications.For 8 years, Scale has been the leading AI data foundry, helping fuel the most exciting advancements in AI, including : g...Show moreLast updated: 30+ days ago

Promoted

Distributed ML Systems Engineer- Inference

Together AISan Francisco, CA, United States

Full-time

Together AI is seeking a Distributed ML Systems Engineer to design and build scalable machine learning systems that power our accelerated AI initiatives. This role involves developing large-scale, f...Show moreLast updated: 30+ days ago

Promoted
New!

Senior Machine Learning Engineer, LLM / VLM Visual Reasoning

WaymoSan Francisco, CA, United States

Full-time

Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver.Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on buildin...Show moreLast updated: 15 hours ago

Promoted
New!

Distributed LLM Inference Engineer

AnyscaleSan Francisco, CA, United States

Full-time

Promoted

ML Research Engineer, ML Systems

Scale AISan Francisco, CA, United States

Full-time

Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and opera...Show moreLast updated: 30+ days ago

Promoted
New!

Senior Machine Learning Engineer, LLM / VLM Continual Pre-training

WaymoSan Francisco, CA, United States

Full-time

Promoted

LLM Inference Frameworks and Optimization Engineer

Together AISan Francisco, CA, United States

Full-time

Our mission is to optimize inference frameworks, algorithms, and infrastructure, pushing the boundaries of performance, scalability, and cost-efficiency. We are seeking anInference Frameworks and Op...Show moreLast updated: 30+ days ago

Promoted
New!

LLM Algorithmic Optimization Engineer

NIOSan Jose, CA, United States

Full-time

NIO is a pioneer and a leading company in the premium smart electric vehicle market.Founded in November 2014, NIO's mission is to shape a joyful lifestyle. NIO aims to build a community starting wit...Show moreLast updated: 16 hours ago

Promoted

ML Research Engineer, ML Systems

Scale AI, Inc.San Francisco, CA, United States

Full-time

Promoted

ML Ops Engineer

Omni InclusiveSan Leandro, CA, United States

Full-time

ML Ops Engineer to drive the full lifecycle of machine learning solutions-from data exploration and model development to scalable deployment and monitoring. This role bridges the gap between data sc...Show moreLast updated: 30+ days ago

Promoted
New!

Applied ML / LLM Engineer

PincitesSan Francisco, CA, United States

Full-time

Were looking for a sharp, ambitious.AI-native products someone who knows how to turn messy real-world data into performant models, fine-tune and deploy LLMs, and design feedback loops that make AI ...Show moreLast updated: 15 hours ago

Promoted
New!

Applied ML Engineer (LLMs & RAG)

AlldusSan Francisco, CA, United States

Full-time

This role is a combination of research and engineering.We are looking for someone who's a talented software engineer at their core, but has contributed to AI research, especially in the field of RA...Show moreLast updated: 16 hours ago

Promoted
New!

ML Engineer with LLM + Agentic AI

Cardinal Integrated Technologies, Inc.San Francisco, CA, United States

Temporary

Role : ML Engineer with LLM + Agentic AI.Duration : 6-12+ Months Contract.Skill 1 - Experience designing, training, fine-tuning, and deploying LLM / ML models for production. Skill 2 - Hands-on experien...Show moreLast updated: 16 hours ago

Promoted
New!

LLM Inference Frameworks and Optimization EngineerSan Francisco, Singapore, Amsterdam

Together AISan Francisco, CA, United States

Full-time

Inference Frameworks And Optimization Engineer.Our mission is to optimize inference frameworks, algorithms, and infrastructure, pushing the boundaries of performance, scalability, and cost-efficien...Show moreLast updated: 15 hours ago

Promoted

ML Engineer with LLM, Langchain and Google ADk

Diverse LynxSan Francisco, CA, United States

Full-time

ML Engineer with LLM, Langchain and Google ADk.Location : Sunnyvale, CA Onsite.Candidate must have extensive knowledge in Machine learning with LLM. Candidate should have Agentic AI experience.Candi...Show moreLast updated: 30+ days ago

Promoted

ML Engineer

RIT Solutions, Inc.Fremont, CA, United States

Full-time

Onsite in Fremont, CA (MUST BE LOCAL).In-depth knowledge of Python for high-performance data-intensive applications.Familiarity with at least one modern deep learning framework (Pytorch, Jax, Tenso...Show moreLast updated: 30+ days ago