JOB DETAILS :
Sr DevOps Engineer - AI platform
Location Westlake, Village, CA (Onsite Work)
Contract Duration 6 months contract to hire full time employment
Hourly Rate : $60 - $72 / hr on W2 contract.
Job Description : Responsibilities :
The Sr DevOps Engineer - AI platform will :
- Design, implement, and manage scalable and resilient infrastructure on AWS.
- Architect and maintain Windows / Linux based environments, ensuring seamless integration with cloud platforms.
- Develop and maintain infrastructure-as-code(IaC) using both AWS Cloudformation / CDK and Terraform / OpenTofu.
- Develop and maintain Configuration Management for Windows & Linux servers using Chef.
- Design, build, and optimize CI / CD pipelines using GitLab CI / CD for .NET applications.
- Integrate and support AI services, including orchestration with AWS Bedrock, Google Agentspace, and other generative AI frameworks, ensuring they can be securely and efficiently consumed by platform services.
- Enable AI / ML workflows by building and optimizing infrastructure pipelines that support large-scale model training, inference, and deployment across AWS and GCP environments.
- Automate model lifecycle management (training, deployment, monitoring) through CI / CD pipelines, ensuring reproducibility and seamless integration with development workflows.
- Collaborate with AI engineering teams to deliver scalable environments, standardized APIs, and infrastructure that accelerate AI adoption at the platform level.
- Implement observability, security, data privacy and cost-optimization strategies specifically for AI workloads, including monitoring and resource scaling for inference services.
- Implement and enforce security best practices across the infrastructure and deployment processes.
- Collaborate closely with development teams to understand their needs and provide DevOps expertise.
- Troubleshoot and resolve infrastructure and application deployment issues.
- Implement and manage monitoring and logging solutions to ensure system visibility and proactive issue detection.
- Clearly and concisely contribute to the development and documentation of DevOps standards and best practices.
- Stay up-to-date with the latest industry trends and technologies in cloud computing, DevOps, and security.
- Provide mentorship and guidance to junior team members.
Qualifications :
Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience).5+ years of experience in a DevOps or Site Reliability Engineering (SRE) role.1+ year(s) of experience with AI services & LLMs.Extensive hands-on experience with Amazon Web Services (AWS)Solid understanding of Windows / Linux Server administration and integration with cloud environments.Proven experience with infrastructure-as-code tools, specifically AWS CDK and Terraform.Strong experience designing and implementing CI / CD pipelines using GitLab CI / CD.Experience deploying and managing .NET applications in cloud environments.Deep understanding of security best practices and their implementation in cloud infrastructure and CI / CD pipelines.Solid understanding of networking principles (TCP / IP, DNS, load balancing, firewalls) in cloud environments.Experience with monitoring and logging tools (e.g., NewRelic, CloudWatch).Strong scripting skills (e.g., PowerShell, Python, Ruby, Bash).Excellent problem-solving and troubleshooting skills.Strong communication and collaboration skills.Experience with containerization technologies (e.g., Docker, Kubernetes) is a plus.Relevant AWS and / or GCP certifications are a plus.Experience with the configuration management tool ChefPreferred Qualifications :
Knowledge of and a strong understanding of Powershell and Python ScriptingStrong background with AWS EC2 features and Services (Autoscaling and WarmPools)Understanding of Windows server Build process using tools like Chocolaty for packages and Packer for AMI / Image generation.Extensive hands-on experience with Amazon Web Services (AWS)