ONSITE
Core Experience
* Hands-on experience deploying open-source LLMs such as Meta Llama 3 and Mistral / Mixtral in on-prem or private environments
* Strong proficiency in Python for LLM inference, prompt engineering, and integration
* Experience with CPU-based inference, model quantization, and performance tuning
Vector Databases & RAG
* Practical experience with open-source vector databases such as Qdrant, Chroma, Milvus, or pgvector
* Proven implementation of Retrieval-Augmented Generation (RAG) pipelines
* Experience generating and managing embeddings and metadata filtering
Security & Governance
* Understanding of data privacy, air-gapped deployments, and enterprise security requirements
* Experience implementing access controls and audit logging
Nice to Have
* Experience with LangChain or LlamaIndex
* Exposure to Rust, Go, or C++ for high-performance services
* Familiarity with Docker and Kubernetes for on-prem deployments
* Knowledge of inference frameworks (e.g., vLLM, llama.cpp, Hugging Face Transformers)
* Prior work in regulated or enterprise environments
Deliverables
* Reference architecture and deployment guidance
* Working prototype (LLM + vector DB + RAG)
* Documentation and knowledge transfer to internal teams
Python Developer • Philadelphia, PA, United States