ML Infrastructure Engineer

PhizenixMenlo Park, CA, United States

30+ days ago

Job type

Full-time

Permanent

Job description

ML Infrastructure Engineer

Menlo Park, CA | On-Site | Full-Time / Direct Hire

Looking for ML Infra experts (Bay Area preferred) with deep experience in CUDA, GPU optimization, VLLMs, and LLM inference-pure language focus, no vision / audio.

Client Opportunity | Through Phizenix

Phizenix, a certified minority and women-led recruiting firm, is hiring on behalf of an AI startup pioneering diffusion-based large language models-built for faster generation, multimodal integration, and scalable enterprise deployment.

We're looking for a ML Infrastructure Engineer to help build the infrastructure that powers large-scale model training and real-time inference. You'll collaborate with world-class researchers and engineers to design high-performance, distributed systems that bring advanced LLMs into production.

Responsibilities

Design and manage distributed infrastructure for ML training at scale
Optimize model serving systems for low-latency inference
Build automated pipelines for data processing, model training, and deployment
Implement observability tools to monitor performance in production
Maximize resource utilization across GPU clusters and cloud environments
Translate research requirements into robust, scalable system designs

Must-Haves

Masters or PhD in Computer Science, Engineering, or a related field (or equivalent experience)

Strong foundation in software engineering, systems design, and distributed systems

Experience with cloud platforms (AWS, GCP, or Azure)

Proficient in Python and at least one systems-level language (C++ / Rust / Go)

Hands-on experience with Docker, Kubernetes, and CI / CD workflows

Familiarity with ML frameworks like PyTorch or TensorFlow from a systems perspective

Understanding of GPU programming and high-performance infrastructure

Nice-to-Haves

Experience with large-scale ML training clusters and GPU orchestration

Knowledge of LLM-serving tools (vLLM, TensorRT, ONNX Runtime)

Experience with distributed training strategies (e.g., data / model / pipeline parallelism)

Familiarity with orchestration tools like Kubeflow or Airflow

Background in performance tuning, system profiling, and MLOps best practices

At Phizenix , we're committed to supporting diverse and inclusive teams. This is your chance to shape the systems that power the next generation of AI innovation. Let's build the future-together.

California Pay Range

$180,000-$200,000 USD

Create a job alert for this search

Infrastructure Engineer • Menlo Park, CA, United States

Related jobs

Promoted
New!

Infrastructure Deployment Engineer

Cloudflare IncSan Francisco, CA, United States

Full-time

At Cloudflare, we are on a mission to help build a better Internet.Today the company runs one of the world's largest networks that powers millions of websites and other Internet properties for cust...Show moreLast updated: 15 hours ago

Promoted
New!

Global Infrastructure Engineer

METAMenlo Park, CA, United States

Full-time

The Site Operations team is responsible for the delivery of data center compute and storage at Meta, enabling our family of apps and services to support a growing global community.We are seeking a ...Show moreLast updated: 13 hours ago

Promoted
New!

Infrastructure Engineer

OuterboundsSan Francisco, CA, United States

Full-time

We are building Metaflow (which we started at Netflix) - an open-source, human-centric ML framework that helps data scientists and ML engineers develop and deliver real-life ML projects.Besides Net...Show moreLast updated: 15 hours ago

Promoted
New!

ML Infrastructure Engineer in Oakland

Energy Jobline ZROakland, CA, United States

Full-time

Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub.We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy ...Show moreLast updated: 15 hours ago

Promoted

ML Infrastructure Engineer, Safeguards

AnthropicSan Francisco, CA, United States

Full-time

Anthropics mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group o...Show moreLast updated: 30+ days ago

Promoted
New!

ML Infrastructure Engineer in Menlo Park

Energy Jobline ZRMenlo Park, CA, United States

Full-time +1

Promoted
New!

Software Engineer, ML Infrastructure, Level 4

SnapSan Francisco, CA, United States

Full-time

Snap Inc () is a technology company.We believe the camera presents the greatest opportunity to improve the way people live and communicate. Snap contributes to human progress by empowering people to...Show moreLast updated: 15 hours ago

Promoted

Infrastructure Engineer

FactorySan Francisco, CA, United States

Full-time

Factory is seeking seasoned Infrastructure Engineers to architect, build, and maintain our cloud infrastructure.Lead the design and implementation of robust, secure, and highly scalable cloud infra...Show moreLast updated: 30+ days ago

Promoted

Software Infrastructure & Platform Engineer

PsiQuantumPalo Alto, CA, United States

Full-time

Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago

Promoted

AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - ML Compute

AppleSan Francisco, CA, United States

Full-time

Apple is where individual imaginations gather together, committing to the values that lead to great work.Every new product we build, service we create, or Apple Store experience we deliver is the r...Show moreLast updated: 30+ days ago

Promoted

ML Infrastructure Engineer

Bluespace LLCOakland, CA, United States

Full-time

Unlike conventional autonomy software, our patented 4D Predictive Perception removes reliance on data.By leveraging next-gen 4D sensors, we can precisely predict the motion of all objects, increasi...Show moreLast updated: 30+ days ago

Promoted
New!

AIML - ML Infrastructure Engineer, ML Platform & Technology - ML Compute

AppleSan Francisco, CA, United States

Full-time

Promoted
New!

MTS, Infrastructure Engineer

DelphinaSan Francisco, CA, United States

Full-time

Today's Data Scientists are in pain - spending their time manually wrangling data, building models through slow trial and error, taking on painstaking rewrites for deployment, and dealing with coun...Show moreLast updated: 15 hours ago

Promoted
New!

Lead Infrastructure Engineer

Storm3San Francisco, CA, United States

Full-time

Connecting the best Engineering talent with innovative HealthTechs worldwide | Storm3.AI-driven revenue cycle management solutions. Platinum Health Insurance plan.Show moreLast updated: 15 hours ago

Promoted
New!

AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training Compute

AppleSan Francisco, CA, United States

Full-time

Promoted
New!

Infrastructure Engineer

DescriptSan Francisco, CA, United States

Full-time

Descript is on a mission to make audio and video content creation and editing fast, easy, and accessible to all.We are building a cutting-edge media editor incorporating real time collaboration, gr...Show moreLast updated: 15 hours ago

Promoted
New!

Infrastructure Engineer

RetoolSan Francisco, CA, United States

Full-time

Nearly every company in the world runs on custom software for critical operations like tracking performance metrics, handling customer support workflows, building admin dashboards, and countless ot...Show moreLast updated: 15 hours ago

Promoted
New!

Senior ML infrastructure engineer

KuzcoSan Francisco, CA, United States

Full-time

Kuzco is seeking a Senior ML Infrastructure Engineer to join our team.This role involves developing large-scale, fault-tolerant systems that handle millions of large language model inference reques...Show moreLast updated: 15 hours ago