Talent.com
Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA

Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA

EnigmaSan Jose, CA, United States
7 days ago
Job type
  • Full-time
Job description

Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA

Title : Machine Learning Engineer

Location : San Jose, CA

Responsibilities :

  • Productize and optimize models from Research into reliable, performant, and cost-efficient services with clear SLOs (latency, availability, cost).
  • Scale training across nodes / GPUs (DDP / FSDP / ZeRO, pipeline / tensor parallelism) and own throughput / time-to-train using profiling and optimization.
  • Implement model-efficiency techniques (quantization, distillation, pruning, KV-cache, Flash Attention) for training and inference without materially degrading quality.
  • Build and maintain model-serving systems (vLLM / Triton / TGI / ONNX / TensorRT / AITemplate) with batching, streaming, caching, and memory management.
  • Integrate with vector / feature stores and data pipelines (FAISS / Milvus / Pinecone / pgvector; Parquet / Delta) as needed for production.
  • Define and track performance and cost KPIs; run continuous improvement loops and capacity planning.
  • Partner with ML Ops on CI / CD, telemetry / observability, model registries; partner with Scientists on reproducible handoffs and evaluations.

Educational Qualifications :

  • Bachelors in computer science, Electrical / Computer Engineering, or a related field required; Master’s preferred (or equivalent industry experience).
  • Strong systems / ML engineering with exposure to distributed training and inference optimization.
  • Industry Experience :

  • 3–5 years in ML / AI engineering roles owning training and / or serving in production at scale.
  • Demonstrated success delivering high-throughput, low-latency ML services with reliability and cost improvements.
  • Experience collaborating across Research, Platform / Infra, Data, and Product functions.
  • Technical Skills :

  • Familiarity with deep learning frameworks : PyTorch (primary), TensorFlow.
  • Exposure to large model training techniques (DDP, FSDP, ZeRO, pipeline / tensor parallelism); distributed training experience a plus
  • Optimization : experience profiling and optimizing code execution and model inference : (PTQ / QAT / AWQ / GPTQ), pruning, distillation, KV-cache optimization, Flash Attention
  • Scalable serving : autoscaling, load balancing, streaming, batching, caching; collaboration with platform engineers.
  • Data & storage : SQL / NoSQL, vector stores (FAISS / Milvus / Pinecone / pgvector), Parquet / Delta, object stores.
  • Write performant, maintainable code
  • Understanding of the full ML lifecycle : data collection, model training, deployment, inference, optimization, and evaluation.
  • Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA

    Create a job alert for this search

    Machine Learning Engineer • San Jose, CA, United States

    Related jobs
    • Promoted
    Machine Learning Engineer, Distributed Training, Optimus

    Machine Learning Engineer, Distributed Training, Optimus

    Tesla Motors, Inc.Palo Alto, CA, United States
    Full-time
    As a Software Engineer for the Optimus team, you will build the tools and infrastructure to make and measure improvements to neural network architecture, visualize data, assist with exporting and d...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Engineer

    Machine Learning Engineer

    MetaMenlo Park, CA, United States
    Full-time
    Machine Learning EngineerMetaSoftware EngineeringMachine LearningMachine Learning Engineer Responsibilities • Adapt standard machine learning methods to best exploit modern parallel environments (e....Show moreLast updated: 22 days ago
    • Promoted
    Machine Learning Engineer

    Machine Learning Engineer

    StreamSan Francisco, CA, United States
    Full-time
    This range is provided by Stream.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Cut the time, cost, and hassle of managing 10B+ medical documen...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Senior Machine Learning Engineer- Trajectory Generation

    Senior Machine Learning Engineer- Trajectory Generation

    ProtingentHillsborough, CA, United States
    Permanent
    Senior Machine Learning Engineer- Trajectory Generation.Protingent Staffing has an exciting remote Direct Hire opportunity. Research, design, implement, optimize and deploy deep learning models that...Show moreLast updated: 11 hours ago
    • Promoted
    Machine Learning Engineer

    Machine Learning Engineer

    Jobright.aiSan Francisco, CA, United States
    Full-time
    Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features.Jobright is an AI-powered career platform that helps job seekers discover the top opportunities in the...Show moreLast updated: 30+ days ago
    • Promoted
    Sr. Machine Learning Engineer, Charging Data Modeling

    Sr. Machine Learning Engineer, Charging Data Modeling

    TeslaPalo Alto, CA, United States
    Full-time
    Machine Learning Engineer, Charging Data Modeling.We are the charging-data-modeling team that uses data analytics and machine learning to bridge the engineering, service, deployment and operation o...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Engineer

    Machine Learning Engineer

    Scribd, Inc.San Francisco, CA, United States
    Full-time
    Get AI-powered advice on this job and more exclusive features.At Scribd (pronounced “scribbed”), our mission is to spark human curiosity. Join our team as we create a world of stories and knowledge,...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Engineer

    Machine Learning Engineer

    Clutch CanadaPalo Alto, CA, United States
    Full-time
    Palo Alto, CA - Engineering - Hybrid - Full-time.Building hardware is like writing software with no debugger, no logs, and only three compile attempts — before mass production.This lack of visibili...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Engineer

    Machine Learning Engineer

    BlandSan Francisco, CA, United States
    Full-time
    This range is provided by Bland.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Based out of San Francisco, we’re a quickly growing team strivin...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Machine Learning Engineer

    Machine Learning Engineer

    ECLAROLos Altos, CA, US
    Full-time
    Technology Delivery Specialist at Eclaro No 3rd Parties Candidates Only.ML engineering experience at an AI / ML-focused organization. Familiarity with the state-of-the-art in behavior learning, langua...Show moreLast updated: 11 hours ago
    • Promoted
    Machine Learning Engineer, 2+ Years Experience

    Machine Learning Engineer, 2+ Years Experience

    TwelveLabsSan Francisco, CA, United States
    Full-time
    Machine Learning Engineer, 2+ Years Experience.Machine Learning Engineer, 2+ Years Experience.This range is provided by TwelveLabs. Your actual pay will be based on your skills and experience — talk...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Senior Machine Learning Engineer

    Senior Machine Learning Engineer

    Tensor AutoSan Jose, CA, United States
    Full-time
    Investigate and develop computer vision algorithms and ML models for the perception system.Work on deployment of these algorithms and models on autonomous vehicles. Evaluate and understand the chall...Show moreLast updated: 16 hours ago
    • Promoted
    AI / Machine Learning Engineer - Python | TheLoops

    AI / Machine Learning Engineer - Python | TheLoops

    IFSSan Francisco, CA, United States
    Full-time
    As a Software Engineer, AI / ML, you will design, build, and optimize the backend systems that power intelligent agent workflows. You will work across data pipelines, APIs, and AI / ML frameworks to cre...Show moreLast updated: 10 days ago
    • Promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    Hedra, IncSan Francisco, CA, United States
    Full-time
    Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    Ipro Networks Pte. Ltd.San Francisco, CA, United States
    Full-time
    Job Title : Machine Learning Engineer, Training Infrastructure | Position Type : Full time | Location : San Francisco, CA, USA | Salary Range : $150,000 - $250,000 (USD) | Job ID# : 158135.Design, imple...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Engineer

    Machine Learning Engineer

    Bland.ai, Inc.San Francisco, CA, United States
    Full-time
    Based out of San Francisco, we're a quickly growing team striving to change the way customers interact with businesses.We've raised $65 million from Silicon Valley's finest; Including Emergence Cap...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Engineer, LLM Fine-Tuning

    Machine Learning Engineer, LLM Fine-Tuning

    First Soft Solutions LLCSan Jose, CA, United States
    Full-time
    Machine Learning Engineer, LLM Fine‑Tuning.LLM fine‑tuning for Verilog / RTL applications.LLM fine‑tuning, Verilog / RTL, AWS, Bedrock, SageMaker. Own the technical roadmap for Verilog / RTL‑focused LLM c...Show moreLast updated: 27 days ago
    • Promoted
    Machine Learning Engineer, Cloudforce One Threat Intelligence

    Machine Learning Engineer, Cloudforce One Threat Intelligence

    Cloudflare, Inc.San Francisco, CA, United States
    Full-time
    At Cloudflare, we are on a mission to help build a better Internet.Today the company runs one of the world's largest networks that powers millions of websites and other Internet properties for cust...Show moreLast updated: 30+ days ago