Talent.com
LLM / ML Engineer (Inference)

LLM / ML Engineer (Inference)

ReductoSan Francisco, CA, United States
16 hours ago
Job type
  • Full-time
Job description

Job Opportunity

We would love to meet you if you :

  • Philosophy : You are your own worst critic. You have a high bar for quality and don't rest until the job is done rightno settling for 90%. We want someone who ships fast, with high agency, and who doesn't just voice problems but actively jumps in to fix them.
  • Experience : You have deep expertise in Python and PyTorch, with a strong foundation in low-level operating systems concepts including multi-threading, memory management, networking, storage, performance, and scale. You're experienced with modern inference systems like TGI, vLLM, TensorRT-LLM, and Optimum, and comfortable creating custom tooling for testing and optimization.
  • Approach : You combine technical expertise with practical problem-solving. You're methodical in debugging complex systems and can rapidly prototype and validate solutions.

The core work will include :

  • Architecting and implementing robust, scalable inference systems for serving state-of-the-art AI models
  • Optimizing model serving infrastructure for high throughput and low latency at scale
  • Developing and integrating advanced inference optimization techniques
  • Working closely with our research team to bring cutting-edge capabilities into production
  • Building developer tools and infrastructure to support rapid experimentation and deployment.
  • Bonus points if you :

  • Have experience with low-level systems programming (CUDA, Triton) and compiler optimization
  • Are passionate about open-source contributions and staying current with ML infrastructure developments
  • Bring practical experience with high-performance computing and distributed systems
  • Have worked in early-stage environments where you helped shape technical direction
  • Are energized by solving complex technical challenges in a collaborative environment
  • This is an in person role at our office in SF. We're an early stage company which means that the role requires working hard and moving quickly. Please only apply if that excites you.

    Create a job alert for this search

    Engineer • San Francisco, CA, United States

    Related jobs
    • Promoted
    Distributed LLM Inference Engineer

    Distributed LLM Inference Engineer

    Anyscale, IncSan Francisco, CA, United States
    Full-time
    At Anyscale, we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We're commercializing Ray, a popular open-source project that'...Show moreLast updated: 30+ days ago
    • Promoted
    ML Engineer

    ML Engineer

    Bedrock SecuritySan Francisco, CA, United States
    Full-time
    Must be willing to relocate to the Bay Area (Menlo Park).Must be legally able to work in the United States.We can sponsor you if you are already in the United States. Bedrock Security seeks an exper...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Engineer, GenAI Applied ML

    Machine Learning Engineer, GenAI Applied ML

    Scale AI, Inc.San Francisco, CA, United States
    Full-time
    At Scale AI, our mission is to accelerate the development of AI applications.For 8 years, Scale has been the leading AI data foundry, helping fuel the most exciting advancements in AI, including : g...Show moreLast updated: 30+ days ago
    • Promoted
    Distributed ML Systems Engineer- Inference

    Distributed ML Systems Engineer- Inference

    Together AISan Francisco, CA, United States
    Full-time
    Together AI is seeking a Distributed ML Systems Engineer to design and build scalable machine learning systems that power our accelerated AI initiatives. This role involves developing large-scale, f...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Senior Machine Learning Engineer, LLM / VLM Visual Reasoning

    Senior Machine Learning Engineer, LLM / VLM Visual Reasoning

    WaymoSan Francisco, CA, United States
    Full-time
    Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver.Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on buildin...Show moreLast updated: 15 hours ago
    • Promoted
    • New!
    Distributed LLM Inference Engineer

    Distributed LLM Inference Engineer

    AnyscaleSan Francisco, CA, United States
    Full-time
    At Anyscale, we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We're commercializing Ray, a popular open-source project that'...Show moreLast updated: 16 hours ago
    • Promoted
    ML Research Engineer, ML Systems

    ML Research Engineer, ML Systems

    Scale AISan Francisco, CA, United States
    Full-time
    Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and opera...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Senior Machine Learning Engineer, LLM / VLM Continual Pre-training

    Senior Machine Learning Engineer, LLM / VLM Continual Pre-training

    WaymoSan Francisco, CA, United States
    Full-time
    Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver.Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on buildin...Show moreLast updated: 16 hours ago
    • Promoted
    LLM Inference Frameworks and Optimization Engineer

    LLM Inference Frameworks and Optimization Engineer

    Together AISan Francisco, CA, United States
    Full-time
    Our mission is to optimize inference frameworks, algorithms, and infrastructure, pushing the boundaries of performance, scalability, and cost-efficiency. We are seeking anInference Frameworks and Op...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    LLM Algorithmic Optimization Engineer

    LLM Algorithmic Optimization Engineer

    NIOSan Jose, CA, United States
    Full-time
    NIO is a pioneer and a leading company in the premium smart electric vehicle market.Founded in November 2014, NIO's mission is to shape a joyful lifestyle. NIO aims to build a community starting wit...Show moreLast updated: 16 hours ago
    • Promoted
    ML Research Engineer, ML Systems

    ML Research Engineer, ML Systems

    Scale AI, Inc.San Francisco, CA, United States
    Full-time
    Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and opera...Show moreLast updated: 30+ days ago
    • Promoted
    ML Ops Engineer

    ML Ops Engineer

    Omni InclusiveSan Leandro, CA, United States
    Full-time
    ML Ops Engineer to drive the full lifecycle of machine learning solutions-from data exploration and model development to scalable deployment and monitoring. This role bridges the gap between data sc...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Applied ML / LLM Engineer

    Applied ML / LLM Engineer

    PincitesSan Francisco, CA, United States
    Full-time
    Were looking for a sharp, ambitious.AI-native products someone who knows how to turn messy real-world data into performant models, fine-tune and deploy LLMs, and design feedback loops that make AI ...Show moreLast updated: 15 hours ago
    • Promoted
    • New!
    Applied ML Engineer (LLMs & RAG)

    Applied ML Engineer (LLMs & RAG)

    AlldusSan Francisco, CA, United States
    Full-time
    This role is a combination of research and engineering.We are looking for someone who's a talented software engineer at their core, but has contributed to AI research, especially in the field of RA...Show moreLast updated: 16 hours ago
    • Promoted
    • New!
    ML Engineer with LLM + Agentic AI

    ML Engineer with LLM + Agentic AI

    Cardinal Integrated Technologies, Inc.San Francisco, CA, United States
    Temporary
    Role : ML Engineer with LLM + Agentic AI.Duration : 6-12+ Months Contract.Skill 1 - Experience designing, training, fine-tuning, and deploying LLM / ML models for production. Skill 2 - Hands-on experien...Show moreLast updated: 16 hours ago
    • Promoted
    • New!
    LLM Inference Frameworks and Optimization EngineerSan Francisco, Singapore, Amsterdam

    LLM Inference Frameworks and Optimization EngineerSan Francisco, Singapore, Amsterdam

    Together AISan Francisco, CA, United States
    Full-time
    Inference Frameworks And Optimization Engineer.Our mission is to optimize inference frameworks, algorithms, and infrastructure, pushing the boundaries of performance, scalability, and cost-efficien...Show moreLast updated: 15 hours ago
    • Promoted
    ML Engineer with LLM, Langchain and Google ADk

    ML Engineer with LLM, Langchain and Google ADk

    Diverse LynxSan Francisco, CA, United States
    Full-time
    ML Engineer with LLM, Langchain and Google ADk.Location : Sunnyvale, CA Onsite.Candidate must have extensive knowledge in Machine learning with LLM. Candidate should have Agentic AI experience.Candi...Show moreLast updated: 30+ days ago
    • Promoted
    ML Engineer

    ML Engineer

    RIT Solutions, Inc.Fremont, CA, United States
    Full-time
    Onsite in Fremont, CA (MUST BE LOCAL).In-depth knowledge of Python for high-performance data-intensive applications.Familiarity with at least one modern deep learning framework (Pytorch, Jax, Tenso...Show moreLast updated: 30+ days ago