ML Infrastructure Engineer
Menlo Park, CA | On-Site | Full-Time / Direct Hire
Looking for ML Infra experts (Bay Area preferred) with deep experience in CUDA, GPU optimization, VLLMs, and LLM inference-pure language focus, no vision / audio.
Client Opportunity | Through Phizenix
Phizenix, a certified minority and women-led recruiting firm, is hiring on behalf of an AI startup pioneering diffusion-based large language models-built for faster generation, multimodal integration, and scalable enterprise deployment.
We're looking for a ML Infrastructure Engineer to help build the infrastructure that powers large-scale model training and real-time inference. You'll collaborate with world-class researchers and engineers to design high-performance, distributed systems that bring advanced LLMs into production.
Responsibilities
- Design and manage distributed infrastructure for ML training at scale
- Optimize model serving systems for low-latency inference
- Build automated pipelines for data processing, model training, and deployment
- Implement observability tools to monitor performance in production
- Maximize resource utilization across GPU clusters and cloud environments
- Translate research requirements into robust, scalable system designs
Must-Haves
Masters or PhD in Computer Science, Engineering, or a related field (or equivalent experience)Strong foundation in software engineering, systems design, and distributed systemsExperience with cloud platforms (AWS, GCP, or Azure)Proficient in Python and at least one systems-level language (C++ / Rust / Go)Hands-on experience with Docker, Kubernetes, and CI / CD workflowsFamiliarity with ML frameworks like PyTorch or TensorFlow from a systems perspectiveUnderstanding of GPU programming and high-performance infrastructureNice-to-Haves
Experience with large-scale ML training clusters and GPU orchestrationKnowledge of LLM-serving tools (vLLM, TensorRT, ONNX Runtime)Experience with distributed training strategies (e.g., data / model / pipeline parallelism)Familiarity with orchestration tools like Kubeflow or AirflowBackground in performance tuning, system profiling, and MLOps best practicesAt Phizenix , we're committed to supporting diverse and inclusive teams. This is your chance to shape the systems that power the next generation of AI innovation. Let's build the future-together.
California Pay Range
$180,000-$200,000 USD