Talent.com
ML Model Serving Engineer

ML Model Serving Engineer

SESAMESan Francisco, CA, United States
1 day ago
Job type
  • Full-time
Job description

Sesame Job Opportunity

Sesame believes in a future where computers are lifelike - with the ability to see, hear, and collaborate with us in ways that feel natural and human. With this vision, we're designing a new kind of computer, focused on making voice companions part of our daily lives. Our team brings together founders from Oculus and Ubiquity6, alongside proven leaders from Meta, Google, and Apple, with deep expertise spanning hardware and software. Join us in shaping a future where computers truly come alive.

Responsibilities :

  • Turbocharge our serving layer, consisting of a variety of LLM, speech, and vision models.
  • Partner with ML infrastructure and training engineers to build a fast, cost-effective, accurate, and reliable serving layer to power a new consumer product category.
  • Modify and extend LLM serving frameworks like VLLM and SGLang to take advantage of the latest techniques in high-performance model serving.
  • Experiment with new compilers to support running models on a variety of hardware compute platforms.
  • Work with the training team to identify opportunities to produce faster models without sacrificing quality.
  • Use techniques like in-flight batching, caching, and custom kernels to speed up inference.
  • Find ways to reduce model initialization times without sacrificing quality.

Required Qualifications :

  • Expert in some differentiable array computing framework, preferably PyTorch.
  • Expert in optimizing machine learning models for serving reliably at high throughput, with low latency.
  • Significant systems programming experience; ex. Experience working on high-performance server systemsyou'd be just as comfortable with the internals of VLLM as you would with a complex PyTorch codebase.
  • Significant performance engineering experience; ex. Bottleneck analysis in high-scale server systems or profiling low-level systems code.
  • Always up to date on the latest techniques for model serving optimization.
  • Preferred Qualifications :

  • Familiarity with high-performance LLM serving; ex. experience with VLLM, SGlang deployment, and internals.
  • Experience with a public cloud platform such as GCP, AWS, or Azure.
  • Experience deploying and scaling inference workloads in the cloud using Kubernetes, Ray, etc.
  • You like to ship and have a track record of leading complex multi-month projects without assistance.
  • You're excited to learn new things and work in a multitude of roles.
  • Sesame is committed to a workplace where everyone feels valued, respected, and empowered. We welcome all qualified applicants, embracing diversity in race, gender, identity, orientation, ability, and more. We provide reasonable accommodations for applicants with disabilitiescontact careers@ for assistance.

    Full-time Employee Benefits :

  • 401k matching
  • 100% employer-paid health, vision, and dental benefits
  • Unlimited PTO and sick time
  • Flexible spending account matching (medical FSA)
  • Benefits do not apply to contingent / contract workers

    Create a job alert for this search

    Ml Engineer • San Francisco, CA, United States

    Related jobs
    • Promoted
    ML Engineer - Generative AI, Siri Core Modeling

    ML Engineer - Generative AI, Siri Core Modeling

    AppleSan Francisco, CA, United States
    Full-time
    The Siri team is looking for passionate Machine Learning Engineers to join us in developing and shipping state-of-the-art generative AI technology to advance Siri and Apple Intelligence for Apple’s...Show moreLast updated: 1 day ago
    • Promoted
    Senior ML Engineer

    Senior ML Engineer

    Shopmonkey.ioSan Francisco, CA, United States
    Full-time
    Shopmonkey's vision is to help every shop thrive by equipping them with the tools they need to run and grow their business. Our cloud based all-in-one shop management software takes owners and techn...Show moreLast updated: 1 day ago
    • Promoted
    Sr. ML Engineer - Generative AI, Siri Core Modeling

    Sr. ML Engineer - Generative AI, Siri Core Modeling

    AppleSan Francisco, CA, United States
    Full-time
    The Siri team is looking for passionate Machine Learning Engineers to join us in developing and shipping state-of-the-art generative AI technology to advance Siri and Apple Intelligence for Apple’s...Show moreLast updated: 1 day ago
    • Promoted
    ML Engineer

    ML Engineer

    PhizenixMenlo Park, CA, United States
    Full-time +1
    Client Opportunity | Through Phizenix.Phizenix, a certified minority and women-led recruiting firm, is hiring on behalf of an innovative generative AI startup that's developing diffusion-based larg...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Enablement Engineer, Siri Core Modeling

    Machine Learning Enablement Engineer, Siri Core Modeling

    Apple Inc.Sunnyvale, CA, United States
    Full-time
    Machine Learning Enablement Engineer, Siri Core Modeling.Sunnyvale, California, United States Machine Learning and AI.Imagine what you could do here. At Apple, revolutionary ideas have a way of beco...Show moreLast updated: 6 days ago
    • Promoted
    ML Engineer

    ML Engineer

    Bedrock SecuritySan Francisco, CA, United States
    Full-time
    Must be willing to relocate to the Bay Area (Menlo Park).Must be legally able to work in the United States.We can sponsor you if you are already in the United States. Bedrock Security seeks an exper...Show moreLast updated: 30+ days ago
    • Promoted
    ML Engineer [IC3]San Francisco, CA

    ML Engineer [IC3]San Francisco, CA

    SourcegraphSan Francisco, CA, United States
    Full-time
    Our mission at Sourcegraph is to make it so that everyone can code, not just ~0.We are transforming how the world's most important companies build software by industrializing development with AI.To...Show moreLast updated: 1 day ago
    • Promoted
    Senior MLOps Engineer

    Senior MLOps Engineer

    Monad FoundationMountain View, CA, United States
    Full-time
    AI protocol on Monad that leverages idle consumer hardware for swarm inference.It enables Small Language Models to achieve advanced multi-step reasoning at lower costs, surpassing the performance a...Show moreLast updated: 14 days ago
    • Promoted
    ML Engineer (LLM / Agent Technologies LangGraph) - Staff (IC6)

    ML Engineer (LLM / Agent Technologies LangGraph) - Staff (IC6)

    Albert InventOakland, CA, United States
    Full-time
    Welcome to the AI / ML team at Albert Invent! We are a fast-growing and innovative company revolutionizing the chemical industry through a cutting-edge data management and AI platform.As an ML Engine...Show moreLast updated: 30+ days ago
    • Promoted
    ML Research Engineer, ML Systems

    ML Research Engineer, ML Systems

    Scale AI, Inc.San Francisco, CA, United States
    Full-time
    Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and opera...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Applied ML Engineer

    Senior Applied ML Engineer

    MacroscopeSan Francisco, CA, United States
    Full-time
    Macroscope aims to be the source of truth of what's happening for any company that builds software.Our mission is to give leaders clarity and engineers time. We help leaders understand how their pro...Show moreLast updated: 1 day ago
    • Promoted
    Lead ML Engineer

    Lead ML Engineer

    VisaFoster City, CA, United States
    Full-time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show moreLast updated: 1 day ago
    • Promoted
    Founding ML Engineer - Outspeed

    Founding ML Engineer - Outspeed

    Pear VCSan Francisco, CA, United States
    Full-time
    Outspeed is creating the most lifelike conversational voice systems to augment human-computer interaction.We are building the infrastructure and tools to unlock applications in therapy, coaching, c...Show moreLast updated: 30+ days ago
    • Promoted
    Lead Software Engineer - ML Service

    Lead Software Engineer - ML Service

    Dynamo AISan Francisco, CA, United States
    Full-time
    At Dynamo AI, our mission is to empower every organization to harness AIs transformative potential with confidence and control. Our solutions empower customers to ensure AI safety and compliance whi...Show moreLast updated: 1 day ago
    • Promoted
    ML Engineer with LLM, Langchain and Google ADk

    ML Engineer with LLM, Langchain and Google ADk

    Diverse LynxSan Francisco, CA, United States
    Full-time
    ML Engineer with LLM, Langchain and Google ADk.Location : Sunnyvale, CA Onsite.Candidate must have extensive knowledge in Machine learning with LLM. Candidate should have Agentic AI experience.Candi...Show moreLast updated: 30+ days ago
    • Promoted
    AI / ML Model Runtime Engineer

    AI / ML Model Runtime Engineer

    Broadcom CorporationSan Francisco, CA, United States
    Full-time
    If you are a first time user, please create your candidate login account before you apply for a job.If you already have a Candidate Account, please Sign-In before you apply.Broadcom is looking for ...Show moreLast updated: 30+ days ago
    • Promoted
    Founding ML Engineer (SF hybrid / onsite)

    Founding ML Engineer (SF hybrid / onsite)

    CadreSan Francisco, CA, United States
    Full-time
    Founding Machine Learning Engineer.San Francisco, CA (Onsite preferred, Remote considered for exceptional candidates).Available for candidates already based in the U. Outspeed powers emotionally int...Show moreLast updated: 30+ days ago
    • Promoted
    ML Engineer

    ML Engineer

    RIT Solutions, Inc.Fremont, CA, United States
    Full-time
    Onsite in Fremont, CA (MUST BE LOCAL).In-depth knowledge of Python for high-performance data-intensive applications.Familiarity with at least one modern deep learning framework (Pytorch, Jax, Tenso...Show moreLast updated: 30+ days ago