Talent.com
ML Infrastructure Engineer

ML Infrastructure Engineer

PhizenixMenlo Park, CA, United States
30+ days ago
Job type
  • Full-time
  • Permanent
Job description

ML Infrastructure Engineer

Menlo Park, CA | On-Site | Full-Time / Direct Hire

Looking for ML Infra experts (Bay Area preferred) with deep experience in CUDA, GPU optimization, VLLMs, and LLM inference-pure language focus, no vision / audio.

Client Opportunity | Through Phizenix

Phizenix, a certified minority and women-led recruiting firm, is hiring on behalf of an AI startup pioneering diffusion-based large language models-built for faster generation, multimodal integration, and scalable enterprise deployment.

We're looking for a ML Infrastructure Engineer to help build the infrastructure that powers large-scale model training and real-time inference. You'll collaborate with world-class researchers and engineers to design high-performance, distributed systems that bring advanced LLMs into production.

Responsibilities

  • Design and manage distributed infrastructure for ML training at scale
  • Optimize model serving systems for low-latency inference
  • Build automated pipelines for data processing, model training, and deployment
  • Implement observability tools to monitor performance in production
  • Maximize resource utilization across GPU clusters and cloud environments
  • Translate research requirements into robust, scalable system designs

Must-Haves

  • Masters or PhD in Computer Science, Engineering, or a related field (or equivalent experience)
  • Strong foundation in software engineering, systems design, and distributed systems
  • Experience with cloud platforms (AWS, GCP, or Azure)
  • Proficient in Python and at least one systems-level language (C++ / Rust / Go)
  • Hands-on experience with Docker, Kubernetes, and CI / CD workflows
  • Familiarity with ML frameworks like PyTorch or TensorFlow from a systems perspective
  • Understanding of GPU programming and high-performance infrastructure
  • Nice-to-Haves

  • Experience with large-scale ML training clusters and GPU orchestration
  • Knowledge of LLM-serving tools (vLLM, TensorRT, ONNX Runtime)
  • Experience with distributed training strategies (e.g., data / model / pipeline parallelism)
  • Familiarity with orchestration tools like Kubeflow or Airflow
  • Background in performance tuning, system profiling, and MLOps best practices
  • At Phizenix , we're committed to supporting diverse and inclusive teams. This is your chance to shape the systems that power the next generation of AI innovation. Let's build the future-together.

    California Pay Range

    $180,000-$200,000 USD

    Create a job alert for this search

    Infrastructure Engineer • Menlo Park, CA, United States

    Related jobs
    • Promoted
    • New!
    Infrastructure Deployment Engineer

    Infrastructure Deployment Engineer

    Cloudflare IncSan Francisco, CA, United States
    Full-time
    At Cloudflare, we are on a mission to help build a better Internet.Today the company runs one of the world's largest networks that powers millions of websites and other Internet properties for cust...Show moreLast updated: 15 hours ago
    • Promoted
    • New!
    Global Infrastructure Engineer

    Global Infrastructure Engineer

    METAMenlo Park, CA, United States
    Full-time
    The Site Operations team is responsible for the delivery of data center compute and storage at Meta, enabling our family of apps and services to support a growing global community.We are seeking a ...Show moreLast updated: 13 hours ago
    • Promoted
    • New!
    Infrastructure Engineer

    Infrastructure Engineer

    OuterboundsSan Francisco, CA, United States
    Full-time
    We are building Metaflow (which we started at Netflix) - an open-source, human-centric ML framework that helps data scientists and ML engineers develop and deliver real-life ML projects.Besides Net...Show moreLast updated: 15 hours ago
    • Promoted
    • New!
    ML Infrastructure Engineer in Oakland

    ML Infrastructure Engineer in Oakland

    Energy Jobline ZROakland, CA, United States
    Full-time
    Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub.We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy ...Show moreLast updated: 15 hours ago
    • Promoted
    ML Infrastructure Engineer, Safeguards

    ML Infrastructure Engineer, Safeguards

    AnthropicSan Francisco, CA, United States
    Full-time
    Anthropics mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group o...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    ML Infrastructure Engineer in Menlo Park

    ML Infrastructure Engineer in Menlo Park

    Energy Jobline ZRMenlo Park, CA, United States
    Full-time +1
    Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub.We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy ...Show moreLast updated: 15 hours ago
    • Promoted
    • New!
    Software Engineer, ML Infrastructure, Level 4

    Software Engineer, ML Infrastructure, Level 4

    SnapSan Francisco, CA, United States
    Full-time
    Snap Inc () is a technology company.We believe the camera presents the greatest opportunity to improve the way people live and communicate. Snap contributes to human progress by empowering people to...Show moreLast updated: 15 hours ago
    • Promoted
    Infrastructure Engineer

    Infrastructure Engineer

    FactorySan Francisco, CA, United States
    Full-time
    Factory is seeking seasoned Infrastructure Engineers to architect, build, and maintain our cloud infrastructure.Lead the design and implementation of robust, secure, and highly scalable cloud infra...Show moreLast updated: 30+ days ago
    • Promoted
    Software Infrastructure & Platform Engineer

    Software Infrastructure & Platform Engineer

    PsiQuantumPalo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - ML Compute

    AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - ML Compute

    AppleSan Francisco, CA, United States
    Full-time
    Apple is where individual imaginations gather together, committing to the values that lead to great work.Every new product we build, service we create, or Apple Store experience we deliver is the r...Show moreLast updated: 30+ days ago
    • Promoted
    ML Infrastructure Engineer

    ML Infrastructure Engineer

    Bluespace LLCOakland, CA, United States
    Full-time
    Unlike conventional autonomy software, our patented 4D Predictive Perception removes reliance on data.By leveraging next-gen 4D sensors, we can precisely predict the motion of all objects, increasi...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    AIML - ML Infrastructure Engineer, ML Platform & Technology - ML Compute

    AIML - ML Infrastructure Engineer, ML Platform & Technology - ML Compute

    AppleSan Francisco, CA, United States
    Full-time
    Apple is where individual imaginations gather together, committing to the values that lead to great work.Every new product we build, service we create, or Apple Store experience we deliver is the r...Show moreLast updated: 15 hours ago
    • Promoted
    • New!
    MTS, Infrastructure Engineer

    MTS, Infrastructure Engineer

    DelphinaSan Francisco, CA, United States
    Full-time
    Today's Data Scientists are in pain - spending their time manually wrangling data, building models through slow trial and error, taking on painstaking rewrites for deployment, and dealing with coun...Show moreLast updated: 15 hours ago
    • Promoted
    • New!
    Lead Infrastructure Engineer

    Lead Infrastructure Engineer

    Storm3San Francisco, CA, United States
    Full-time
    Connecting the best Engineering talent with innovative HealthTechs worldwide | Storm3.AI-driven revenue cycle management solutions. Platinum Health Insurance plan.Show moreLast updated: 15 hours ago
    • Promoted
    • New!
    AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training Compute

    AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training Compute

    AppleSan Francisco, CA, United States
    Full-time
    Apple is where individual imaginations gather together, committing to the values that lead to great work.Every new product we build, service we create, or Apple Store experience we deliver is the r...Show moreLast updated: 15 hours ago
    • Promoted
    • New!
    Infrastructure Engineer

    Infrastructure Engineer

    DescriptSan Francisco, CA, United States
    Full-time
    Descript is on a mission to make audio and video content creation and editing fast, easy, and accessible to all.We are building a cutting-edge media editor incorporating real time collaboration, gr...Show moreLast updated: 15 hours ago
    • Promoted
    • New!
    Infrastructure Engineer

    Infrastructure Engineer

    RetoolSan Francisco, CA, United States
    Full-time
    Nearly every company in the world runs on custom software for critical operations like tracking performance metrics, handling customer support workflows, building admin dashboards, and countless ot...Show moreLast updated: 15 hours ago
    • Promoted
    • New!
    Senior ML infrastructure engineer

    Senior ML infrastructure engineer

    KuzcoSan Francisco, CA, United States
    Full-time
    Kuzco is seeking a Senior ML Infrastructure Engineer to join our team.This role involves developing large-scale, fault-tolerant systems that handle millions of large language model inference reques...Show moreLast updated: 15 hours ago