Job : Computer Vision / AI Engineer
Duration : Long term contract
Location : Orlando, FL
Job Type : Hybrid
Job Description :
How Youll Make an Impact
Designing, building, and optimizing all aspects of large-scale training and fine-tuning, from dataloading to inference, to maximize Model Flop Utilization (MFU) on large compute clusters.
Working closely and proactively with research scientists to translate models and algorithms into high-performance, production-ready code, integrating and testing the latest advancements.
Relentlessly profiling and resolving training performance bottlenecks, optimizing the entire training stack for speed and efficiency.
Contributing to the technology evaluations and selection of hardware, software, and cloud services for the AI infrastructure platform.
Using MLOps frameworks (MLFlow, WnB, etc.) to ensure best practices across the model lifecycle, ensuring reproducibility, reliability, and continuous improvement.
Creating thorough documentation for infrastructure and training procedures, staying updated on advancements in training strategies, and driving improvements in workflows and infrastructure.
What You Bring
Master's degree or higher in Computer Science, Engineering, or a related technical field.
5 or more years in a Data & AI (Artificial Intelligence) Engineer or Machine Learning Engineer, focusing on building and optimizing infrastructure for large-scale machine learning systems.
Deep practical expertise with AI frameworks (PyTorch, Jax, Pytorch Lightning, etc.), large-scale multi-node GPU training, and optimization strategies for large foundation models on distributed compute infrastructure.
Excellent problem-solving, debugging, and performance optimization skills, with a data-driven approach to identifying and resolving technical challenges.
Artificial Intelligence Engineer • Orlando, FL, US