Senior Software Engineer - ML / LLM Serving
This range is provided by Alldus. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.
Base pay range
$180,000.00 / yr - $220,000.00 / yr
Direct message the job poster from Alldus
About the Role
We are seeking a senior / staff Machine Learning Serving Software Engineer who thrives in a fast-paced, customer-focused environment and can build robust, flexible infrastructure to serve a diverse range of ML models including both LLMs and classical ML.
The Role
As an ML Serving Engineer, you will design, implement, and optimize infrastructure that powers the deployment and inference of machine learning models across varied customer environments. You’ll work closely with product, research, and customer engineering teams to deliver low-latency, secure, and scalable ML serving solutions.
Responsibilities
- Design and build scalable, high-performance ML serving infrastructure capable of handling diverse model types (LLMs, recommendation systems, etc.).
- Optimize inference pipelines for latency, throughput, and cost efficiency.
- Integrate with a wide range of customer environments, adapting serving strategies to fit their infrastructure and compliance needs.
- Deploy, monitor, and maintain ML models in production using modern deployment stacks.
- Collaborate with ML researchers to operationalize new models and ensure seamless integration into customer workflows.
- Ensure security and privacy best practices are applied to model deployment and inference, aligning with enterprise-grade data security requirements.
- Stay up-to-date with the latest serving technologies and frameworks, evaluating and integrating them where relevant.
Qualifications
Required
5+ years of professional software engineering experience, with at least 3+ years focused on ML serving, inference infrastructure, or similar domains.Proven experience deploying and optimizing large language models (LLMs) in production.Hands-on expertise with multiple ML serving frameworks (e.g., TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, Ray Serve, vLLM, etc.).Strong programming skills in Python, Go, or C++.Experience with distributed systems and container orchestration tools (Kubernetes, Docker).Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry) and performance profiling for inference workloads.Solid understanding of secure data handling and privacy-preserving ML practices.Knowledge of cloud platforms (AWS, GCP, Azure) and hybrid / on-prem deployment scenarios.Preferred
Prior experience serving multiple model types beyond LLMs, e.g., recommendation engines and classical ML models.Exposure to model quantization, distillation, caching, and other optimization techniques for inference efficiency.Experience working with enterprise customers or within compliance-heavy environments.Details
Seniority level : Mid-Senior levelEmployment type : Full-timeJob function : Software DevelopmentReferrals increase your chances of interviewing at Alldus by 2x
Related roles
AI / ML Engineer (Multiple roles and seniority levels) – San Jose, CA (examples of similar roles and salary ranges).
#J-18808-Ljbffr