Location : Charlotte, NC or Jersey City, NJ (3 days a week onsite) Resource will choose which site
HYBRID Schedule
Business Hours : Monday - Friday normal business hours.
Must have skills : LLM and Kubernetes
Project Tasks :
AI Operations Platform Consultant
- ? Experience deploying, managing, operating, and troubleshooting containerized services at scale on Kubernetes for mission-critical applications (OpenShift)
- ? Experience with deploying, configuring, and tuning LLMs using TensorRT-LLM and Triton Inference server.
- ? Managing MLOps / LLMOps pipelines, using TensorRT-LLM and Triton Inference server to deploy inference services in production
- ? Setup and operation of AI inference service monitoring for performance and availability.
- ? Experience deploying and troubleshooting LLM models on a containerized platform, monitoring, load balancing, etc.
- ? Operation and support of MLOps / LLMOps pipelines, using TensorRT-LLM and Triton Inference server to deploy inference services in production
- ? Experience deploying and troubleshooting LLM models on a containerized platform, monitoring, load balancing, etc.
- ? Experience with standard processes for operation of a mission critical system – incident management, change management, event management, etc.
- ? Managing scalable infrastructure for deploying and managing LLMs
- ? Deploying models in production environments, including containerization, microservices, and API design
- ? Triton Inference Server, including its architecture, configuration, and deployment.
- ? Model Optimization techniques using Triton with TRTLLM
- ? Model optimization techniques, including pruning, quantization, and knowledge distillation