Talent.com
DevOps Engineer – LLM & GPU Inference Services

DevOps Engineer – LLM & GPU Inference Services

Noblesoft TechnologiesCA, United States
9 hours ago
Job type
  • Full-time
  • Quick Apply
Job description

Job Title : DevOps Engineer LLM & GPU Inference Services

Location : California (Remote)

Job Description :

We are looking for devs with general cloud services / distributed services experience, with LLM experience as a secondary skill. GPU experience is now low on the list of preferred skills : Dedicated Inference Service

Required Skills -

  • Deep experience building services in modern cloud environments on distributed systems (i.e., containerization (Kubernetes, Docker), infrastructure as code, CI / CD pipelines, APIs, authentication and authorization, data storage, deployment, logging, monitoring, alerting, etc.)
  • Experience working with Large Language Models (LLMs), particularly hosting them to run inference
  • Strong verbal and written communication skills. Your job will involve communicating with local and remote colleagues about technical subjects and writing detailed documentation.
  • Experience with building or using benchmarking tools for evaluating LLM inference for various models, engine, and GPU combinations.
  • Familiarity with various LLM performance metrics such as prefill throughput, decode throughput, TPOT, and TTFT
  • Experience with one or more inference engines : e.g., vLLM, SGLang, and Modular Max
  • Familiarity with one or more distributed inference serving frameworks : e.g., llm-d, NVIDIA Dynamo, and Ray Serve etc.
  • Experience with AMD and NVIDIA GPUs, using software like CUDA, ROCm, AITER, NCCL, RCCL, etc.
  • Knowledge of distributed inference optimization techniques - tensor / data parallelism, KV cache optimizations, smart routing etc.

What You'll Be Working On-

  • Develop and maintain an inference platform for serving large language models optimized for the various GPU platforms they will be run on.
  • Work on complex AI and cloud engineering projects through the entire product development lifecycle (PDLC) - ideation, product definition, experimentation, prototyping, development, testing, release, and operations.
  • Build tooling and observability to monitor system health, and build auto tuning capabilities.
  • Build benchmarking frameworks to test model serving performance to guide system and infrastructure tuning efforts.
  • Build native cross platform inference support across NVIDIA and AMD GPUs for a variety of model architectures.
  • Contribute to open source inference engines to make them perform better on DigitalOcean cloud.
  • Create a job alert for this search

    Llm Engineer • CA, United States