Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA

EnigmaSan Jose, CA, United States

7 days ago

Job type

Full-time

Job description

Title : Machine Learning Engineer

Location : San Jose, CA

Responsibilities :

Productize and optimize models from Research into reliable, performant, and cost-efficient services with clear SLOs (latency, availability, cost).
Scale training across nodes / GPUs (DDP / FSDP / ZeRO, pipeline / tensor parallelism) and own throughput / time-to-train using profiling and optimization.
Implement model-efficiency techniques (quantization, distillation, pruning, KV-cache, Flash Attention) for training and inference without materially degrading quality.
Build and maintain model-serving systems (vLLM / Triton / TGI / ONNX / TensorRT / AITemplate) with batching, streaming, caching, and memory management.
Integrate with vector / feature stores and data pipelines (FAISS / Milvus / Pinecone / pgvector; Parquet / Delta) as needed for production.
Define and track performance and cost KPIs; run continuous improvement loops and capacity planning.
Partner with ML Ops on CI / CD, telemetry / observability, model registries; partner with Scientists on reproducible handoffs and evaluations.

Educational Qualifications :

Bachelors in computer science, Electrical / Computer Engineering, or a related field required; Master’s preferred (or equivalent industry experience).

Strong systems / ML engineering with exposure to distributed training and inference optimization.

Industry Experience :

3–5 years in ML / AI engineering roles owning training and / or serving in production at scale.

Demonstrated success delivering high-throughput, low-latency ML services with reliability and cost improvements.

Experience collaborating across Research, Platform / Infra, Data, and Product functions.

Technical Skills :

Familiarity with deep learning frameworks : PyTorch (primary), TensorFlow.

Exposure to large model training techniques (DDP, FSDP, ZeRO, pipeline / tensor parallelism); distributed training experience a plus

Optimization : experience profiling and optimizing code execution and model inference : (PTQ / QAT / AWQ / GPTQ), pruning, distillation, KV-cache optimization, Flash Attention

Scalable serving : autoscaling, load balancing, streaming, batching, caching; collaboration with platform engineers.

Data & storage : SQL / NoSQL, vector stores (FAISS / Milvus / Pinecone / pgvector), Parquet / Delta, object stores.

Write performant, maintainable code

Understanding of the full ML lifecycle : data collection, model training, deployment, inference, optimization, and evaluation.

Create a job alert for this search

Machine Learning Engineer • San Jose, CA, United States

Related jobs

Promoted

Machine Learning Engineer, Distributed Training, Optimus

Tesla Motors, Inc.Palo Alto, CA, United States

Full-time

As a Software Engineer for the Optimus team, you will build the tools and infrastructure to make and measure improvements to neural network architecture, visualize data, assist with exporting and d...Show moreLast updated: 30+ days ago

Promoted

Machine Learning Engineer

MetaMenlo Park, CA, United States

Full-time

Machine Learning EngineerMetaSoftware EngineeringMachine LearningMachine Learning Engineer Responsibilities • Adapt standard machine learning methods to best exploit modern parallel environments (e....Show moreLast updated: 22 days ago

Promoted

Machine Learning Engineer

StreamSan Francisco, CA, United States

Full-time

This range is provided by Stream.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Cut the time, cost, and hassle of managing 10B+ medical documen...Show moreLast updated: 30+ days ago

Promoted
New!

Senior Machine Learning Engineer- Trajectory Generation

ProtingentHillsborough, CA, United States

Permanent

Senior Machine Learning Engineer- Trajectory Generation.Protingent Staffing has an exciting remote Direct Hire opportunity. Research, design, implement, optimize and deploy deep learning models that...Show moreLast updated: 11 hours ago

Promoted

Machine Learning Engineer

Jobright.aiSan Francisco, CA, United States

Full-time

Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features.Jobright is an AI-powered career platform that helps job seekers discover the top opportunities in the...Show moreLast updated: 30+ days ago

Promoted

Sr. Machine Learning Engineer, Charging Data Modeling

TeslaPalo Alto, CA, United States

Full-time

Machine Learning Engineer, Charging Data Modeling.We are the charging-data-modeling team that uses data analytics and machine learning to bridge the engineering, service, deployment and operation o...Show moreLast updated: 30+ days ago

Promoted

Machine Learning Engineer

Scribd, Inc.San Francisco, CA, United States

Full-time

Get AI-powered advice on this job and more exclusive features.At Scribd (pronounced “scribbed”), our mission is to spark human curiosity. Join our team as we create a world of stories and knowledge,...Show moreLast updated: 30+ days ago

Promoted

Machine Learning Engineer

Clutch CanadaPalo Alto, CA, United States

Full-time

Palo Alto, CA - Engineering - Hybrid - Full-time.Building hardware is like writing software with no debugger, no logs, and only three compile attempts — before mass production.This lack of visibili...Show moreLast updated: 30+ days ago

Promoted

Machine Learning Engineer

BlandSan Francisco, CA, United States

Full-time

This range is provided by Bland.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Based out of San Francisco, we’re a quickly growing team strivin...Show moreLast updated: 30+ days ago

Promoted
New!

Machine Learning Engineer

ECLAROLos Altos, CA, US

Full-time

Technology Delivery Specialist at Eclaro No 3rd Parties Candidates Only.ML engineering experience at an AI / ML-focused organization. Familiarity with the state-of-the-art in behavior learning, langua...Show moreLast updated: 11 hours ago

Promoted

Machine Learning Engineer, 2+ Years Experience

TwelveLabsSan Francisco, CA, United States

Full-time

Machine Learning Engineer, 2+ Years Experience.Machine Learning Engineer, 2+ Years Experience.This range is provided by TwelveLabs. Your actual pay will be based on your skills and experience — talk...Show moreLast updated: 30+ days ago

Promoted
New!

Senior Machine Learning Engineer

Tensor AutoSan Jose, CA, United States

Full-time

Investigate and develop computer vision algorithms and ML models for the perception system.Work on deployment of these algorithms and models on autonomous vehicles. Evaluate and understand the chall...Show moreLast updated: 16 hours ago

Promoted

AI / Machine Learning Engineer - Python | TheLoops

IFSSan Francisco, CA, United States

Full-time

As a Software Engineer, AI / ML, you will design, build, and optimize the backend systems that power intelligent agent workflows. You will work across data pipelines, APIs, and AI / ML frameworks to cre...Show moreLast updated: 10 days ago

Promoted

Machine Learning Engineer, Training Infrastructure

Hedra, IncSan Francisco, CA, United States

Full-time

Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show moreLast updated: 30+ days ago

Promoted

Machine Learning Engineer, Training Infrastructure

Ipro Networks Pte. Ltd.San Francisco, CA, United States

Full-time

Job Title : Machine Learning Engineer, Training Infrastructure | Position Type : Full time | Location : San Francisco, CA, USA | Salary Range : $150,000 - $250,000 (USD) | Job ID# : 158135.Design, imple...Show moreLast updated: 30+ days ago

Promoted

Machine Learning Engineer

Bland.ai, Inc.San Francisco, CA, United States

Full-time

Based out of San Francisco, we're a quickly growing team striving to change the way customers interact with businesses.We've raised $65 million from Silicon Valley's finest; Including Emergence Cap...Show moreLast updated: 30+ days ago

Promoted

Machine Learning Engineer, LLM Fine-Tuning

First Soft Solutions LLCSan Jose, CA, United States

Full-time

Machine Learning Engineer, LLM Fine‑Tuning.LLM fine‑tuning for Verilog / RTL applications.LLM fine‑tuning, Verilog / RTL, AWS, Bedrock, SageMaker. Own the technical roadmap for Verilog / RTL‑focused LLM c...Show moreLast updated: 27 days ago

Promoted

Machine Learning Engineer, Cloudforce One Threat Intelligence

Cloudflare, Inc.San Francisco, CA, United States

Full-time

At Cloudflare, we are on a mission to help build a better Internet.Today the company runs one of the world's largest networks that powers millions of websites and other Internet properties for cust...Show moreLast updated: 30+ days ago