ML Model Serving Engineer

SESAMESan Francisco, CA, United States

1 day ago

Job type

Full-time

Job description

Sesame Job Opportunity

Sesame believes in a future where computers are lifelike - with the ability to see, hear, and collaborate with us in ways that feel natural and human. With this vision, we're designing a new kind of computer, focused on making voice companions part of our daily lives. Our team brings together founders from Oculus and Ubiquity6, alongside proven leaders from Meta, Google, and Apple, with deep expertise spanning hardware and software. Join us in shaping a future where computers truly come alive.

Responsibilities :

Turbocharge our serving layer, consisting of a variety of LLM, speech, and vision models.
Partner with ML infrastructure and training engineers to build a fast, cost-effective, accurate, and reliable serving layer to power a new consumer product category.
Modify and extend LLM serving frameworks like VLLM and SGLang to take advantage of the latest techniques in high-performance model serving.
Experiment with new compilers to support running models on a variety of hardware compute platforms.
Work with the training team to identify opportunities to produce faster models without sacrificing quality.
Use techniques like in-flight batching, caching, and custom kernels to speed up inference.
Find ways to reduce model initialization times without sacrificing quality.

Required Qualifications :

Expert in some differentiable array computing framework, preferably PyTorch.

Expert in optimizing machine learning models for serving reliably at high throughput, with low latency.

Significant systems programming experience; ex. Experience working on high-performance server systemsyou'd be just as comfortable with the internals of VLLM as you would with a complex PyTorch codebase.

Significant performance engineering experience; ex. Bottleneck analysis in high-scale server systems or profiling low-level systems code.

Always up to date on the latest techniques for model serving optimization.

Preferred Qualifications :

Familiarity with high-performance LLM serving; ex. experience with VLLM, SGlang deployment, and internals.

Experience with a public cloud platform such as GCP, AWS, or Azure.

Experience deploying and scaling inference workloads in the cloud using Kubernetes, Ray, etc.

You like to ship and have a track record of leading complex multi-month projects without assistance.

You're excited to learn new things and work in a multitude of roles.

Sesame is committed to a workplace where everyone feels valued, respected, and empowered. We welcome all qualified applicants, embracing diversity in race, gender, identity, orientation, ability, and more. We provide reasonable accommodations for applicants with disabilitiescontact careers@ for assistance.

Full-time Employee Benefits :

401k matching

100% employer-paid health, vision, and dental benefits

Unlimited PTO and sick time

Flexible spending account matching (medical FSA)

Benefits do not apply to contingent / contract workers

Create a job alert for this search

Ml Engineer • San Francisco, CA, United States

Related jobs

Promoted

ML Engineer - Generative AI, Siri Core Modeling

AppleSan Francisco, CA, United States

Full-time

The Siri team is looking for passionate Machine Learning Engineers to join us in developing and shipping state-of-the-art generative AI technology to advance Siri and Apple Intelligence for Apple’s...Show moreLast updated: 1 day ago

Promoted

Senior ML Engineer

Shopmonkey.ioSan Francisco, CA, United States

Full-time

Shopmonkey's vision is to help every shop thrive by equipping them with the tools they need to run and grow their business. Our cloud based all-in-one shop management software takes owners and techn...Show moreLast updated: 1 day ago

Promoted

Sr. ML Engineer - Generative AI, Siri Core Modeling

AppleSan Francisco, CA, United States

Full-time

Promoted

ML Engineer

PhizenixMenlo Park, CA, United States

Full-time +1

Client Opportunity | Through Phizenix.Phizenix, a certified minority and women-led recruiting firm, is hiring on behalf of an innovative generative AI startup that's developing diffusion-based larg...Show moreLast updated: 30+ days ago

Promoted

Machine Learning Enablement Engineer, Siri Core Modeling

Apple Inc.Sunnyvale, CA, United States

Full-time

Machine Learning Enablement Engineer, Siri Core Modeling.Sunnyvale, California, United States Machine Learning and AI.Imagine what you could do here. At Apple, revolutionary ideas have a way of beco...Show moreLast updated: 6 days ago

Promoted

ML Engineer

Bedrock SecuritySan Francisco, CA, United States

Full-time

Must be willing to relocate to the Bay Area (Menlo Park).Must be legally able to work in the United States.We can sponsor you if you are already in the United States. Bedrock Security seeks an exper...Show moreLast updated: 30+ days ago

Promoted

ML Engineer [IC3]San Francisco, CA

SourcegraphSan Francisco, CA, United States

Full-time

Our mission at Sourcegraph is to make it so that everyone can code, not just ~0.We are transforming how the world's most important companies build software by industrializing development with AI.To...Show moreLast updated: 1 day ago

Promoted

Senior MLOps Engineer

Monad FoundationMountain View, CA, United States

Full-time

AI protocol on Monad that leverages idle consumer hardware for swarm inference.It enables Small Language Models to achieve advanced multi-step reasoning at lower costs, surpassing the performance a...Show moreLast updated: 14 days ago

Promoted

ML Engineer (LLM / Agent Technologies LangGraph) - Staff (IC6)

Albert InventOakland, CA, United States

Full-time

Welcome to the AI / ML team at Albert Invent! We are a fast-growing and innovative company revolutionizing the chemical industry through a cutting-edge data management and AI platform.As an ML Engine...Show moreLast updated: 30+ days ago

Promoted

ML Research Engineer, ML Systems

Scale AI, Inc.San Francisco, CA, United States

Full-time

Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and opera...Show moreLast updated: 30+ days ago

Promoted

Senior Applied ML Engineer

MacroscopeSan Francisco, CA, United States

Full-time

Macroscope aims to be the source of truth of what's happening for any company that builds software.Our mission is to give leaders clarity and engineers time. We help leaders understand how their pro...Show moreLast updated: 1 day ago

Promoted

Lead ML Engineer

VisaFoster City, CA, United States

Full-time

Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show moreLast updated: 1 day ago

Promoted

Founding ML Engineer - Outspeed

Pear VCSan Francisco, CA, United States

Full-time

Outspeed is creating the most lifelike conversational voice systems to augment human-computer interaction.We are building the infrastructure and tools to unlock applications in therapy, coaching, c...Show moreLast updated: 30+ days ago

Promoted

Lead Software Engineer - ML Service

Dynamo AISan Francisco, CA, United States

Full-time

At Dynamo AI, our mission is to empower every organization to harness AIs transformative potential with confidence and control. Our solutions empower customers to ensure AI safety and compliance whi...Show moreLast updated: 1 day ago

Promoted

ML Engineer with LLM, Langchain and Google ADk

Diverse LynxSan Francisco, CA, United States

Full-time

ML Engineer with LLM, Langchain and Google ADk.Location : Sunnyvale, CA Onsite.Candidate must have extensive knowledge in Machine learning with LLM. Candidate should have Agentic AI experience.Candi...Show moreLast updated: 30+ days ago

Promoted

AI / ML Model Runtime Engineer

Broadcom CorporationSan Francisco, CA, United States

Full-time

If you are a first time user, please create your candidate login account before you apply for a job.If you already have a Candidate Account, please Sign-In before you apply.Broadcom is looking for ...Show moreLast updated: 30+ days ago

Promoted

Founding ML Engineer (SF hybrid / onsite)

CadreSan Francisco, CA, United States

Full-time

Founding Machine Learning Engineer.San Francisco, CA (Onsite preferred, Remote considered for exceptional candidates).Available for candidates already based in the U. Outspeed powers emotionally int...Show moreLast updated: 30+ days ago

Promoted

ML Engineer

RIT Solutions, Inc.Fremont, CA, United States

Full-time

Onsite in Fremont, CA (MUST BE LOCAL).In-depth knowledge of Python for high-performance data-intensive applications.Familiarity with at least one modern deep learning framework (Pytorch, Jax, Tenso...Show moreLast updated: 30+ days ago