Job Title : Hybrid Cloud MLOps Architect
Location : Research Park Triangle, NC (Hybrid 4 days onsite)
Duration : Long Term Contract
Overview
We are seeking an experienced Hybrid Cloud MLOps Architect with 10+ years of experience designing, deploying, and scaling machine learning platforms across multi-cloud environments. This role requires strong technical leadership, expertise in cloud-native ML infrastructure, and the ability to collaborate closely with data science, platform engineering, and business stakeholders.
The Architect will lead cross-functional discussions, translate business needs into technical ML platform requirements, design end-to-end MLOps architectures, and ensure industry-best practices for model lifecycle automation, governance, and production reliability.
Key Responsibilities Architecture & Solution Design
Lead the design and architecture of hybrid cloud MLOps platforms enabling scalable model training, deployment, monitoring, and governance.
Architect solutions integrating on-prem , public cloud , and containerized ML workloads.
Evaluate requirements from data science, analytics, and engineering teams; propose modern, scalable approaches aligned with MLOps best practices.
Build and maintain reusable ML pipelines, workflows, and CI / CD automations.
MLOps Platform Engineering
Design and optimize model deployment frameworks using tools like Kubeflow, MLflow, SageMaker, Vertex AI, Azure ML , or similar.
Create robust systems for model versioning, artifact tracking, automated retraining, and auditability .
Implement monitoring, observability, and drift detection systems for production ML models.
Ensure secure, compliant, and efficient data / feature pipelines across hybrid cloud environments.
Cross-Functional Leadership
Facilitate architecture workshops with data scientists, cloud architects, IT security, and business teams.
Translate complex ML platform concepts into clear recommendations and technical strategies.
Mentor engineers and lead solution walk-throughs, design reviews, and deployment planning sessions.
Issue Triage & Support
Lead triage and root-cause analysis for ML pipeline failures, performance degradation, and deployment issues.
Build resilient, fault-tolerant systems with strong SLAs for production ML workloads.
Provide go-live support, documentation, user training, and post-deployment optimization.
Mandatory Skills
10+ years of experience in cloud engineering, ML platform development, or MLOps architecture.
Deep expertise with hybrid cloud platforms (AWS, Azure, GCP, or on-prem clusters).
Strong experience with :
MLOps frameworks (MLflow, Kubeflow, TFX, Airflow)
Cloud-native technologies (Docker, Kubernetes, Terraform)
ML lifecycle automation (CI / CD, model deployment, monitoring)
Solid understanding of feature stores, model registries, vector databases , and modern ML infrastructure.
Demonstrated ability to lead discussions with engineering, data science, and business stakeholders.
Preferred Skills
Experience designing secure ML architectures with governance and auditability .
Background in supporting production AI / ML workloads at enterprise scale.
Knowledge of DevSecOps , cloud compliance, and infrastructure-as-code.
Experience in regulated environments (finance, healthcare, telecom) beneficial.
Role Type
Hybrid architecture leadership + hands-on engineering
Hybrid onsite model : 4 days onsite per week in Research Park Triangle, NC
Cloud Architect • KY, United States