Talent.com
AI DevOps and Cloud Infrastructure Engineer
AI DevOps and Cloud Infrastructure EngineerCrowe • Los Angeles, CA, US
AI DevOps and Cloud Infrastructure Engineer

AI DevOps and Cloud Infrastructure Engineer

Crowe • Los Angeles, CA, US
12 days ago
Job type
  • Full-time
Job description

Your Journey at Crowe Starts Here

At Crowe, you can build a meaningful and rewarding career. With real flexibility to balance work with life moments, you're trusted to deliver results and make an impact. We embrace you for who you are, care for your well-being, and nurture your career. Everyone has equitable access to opportunities for career growth and leadership. Over our 80-year history, delivering excellent service through innovation has been a core part of our DNA across our audit, tax, and consulting groups. That's why we continuously invest in innovative ideas, such as AI-enabled insights and technology-powered solutions, to enhance our services. Join us at Crowe and embark on a career where you can help shape the future of our industry.

Job Description :

About Crowe AI Transformation

Everything we do is about making the future of human work more purposeful. We do this by leveraging state-of-the-art technologies, modern architecture, and industry experts to create AI-powered solutions that transform the way our clients do business.

The new AI Transformation team will build on Crowe's established AI foundation, furthering the capabilities of our Applied AI / Machine Learning team. By combining Generative AI, Machine Learning and Software Engineering, this team empowers Crowe clients to transform their business models through AI, irrespective of their current AI adoption stage.

As a member of AI Transformation, you will help distinguish Crowe in the market and drive the firm's technology and innovation strategy. The future is powered by AI, come build it with us.

About the Team

  • We invest in expertise. You'll have the time, space, and support to go deep in your projects and build lasting technical and strategic mastery. You'll work with developers, product stakeholders, and project managers as a trusted leader and domain expert.
  • We believe in continuous growth. Our team is committed to professional development and knowledge-sharing.
  • We protect balance. Our distributed team culture is grounded in trust and flexibility. We offer unlimited PTO, a flexible remote work policy, and a supportive environment that prioritizes sustainable, long-term performance.

About the Role

The AI DevOps and Cloud Infrastructure Manager leads teams responsible for designing, operating, and scaling AI / ML infrastructure, cloud platforms, and DevOps automation that support enterprise model training, inference, and generative AI workloads. This role is the strategy and execution of cloud-native, Kubernetes-based platforms that enable reliable, secure, and cost-efficient AI systems.

As a manager, this position combines hands-on technical leadership with people management, delivery ownership, and strategic decision-making. The role oversees distributed compute environments, GPU clusters, CI / CD pipelines, and vector-search infrastructure while ensuring high availability, resilience, and compliance with security and responsible AI standards. The manager partners closely with AI engineering, data engineering, product, and security teams, serves as the primary technical owner for assigned initiatives, and communicates system risks, tradeoffs, and progress to leadership.

Key responsibilities include :

  • Leading engineering teams responsible for AI / ML infrastructure, cloud operations, and MLOps automation.
  • Defining cloud, Kubernetes, and infrastructure strategy to support scalable model training, inference, and generative AI platforms.
  • Guiding the design and operation of distributed compute environments, GPU clusters, and vector database infrastructure.
  • Overseeing CI / CD pipelines that automate model training, testing, deployment, monitoring, and lifecycle management.
  • Managing incident response, failure analysis, and reliability engineering across AI platforms.
  • Directing performance testing, capacity planning, and cost optimization for AI infrastructure.
  • Ensuring compliance with cloud security, IAM practices, governance requirements, and responsible AI frameworks.
  • Implementing multi-cloud resilience patterns, high availability, and automated failover for critical AI workloads.
  • Supporting platform modernization initiatives, including adoption of optimized LLM runtimes and new orchestration technologies.
  • Evaluating third-party infrastructure tools, GPU scheduling solutions, and platform enhancements.
  • Communicating system status, dependencies, risks, and technical decisions to senior leadership.
  • Managing 45 direct reports, including coaching, performance management, and career development.
  • Owning project delivery, including budget, timelines, and quality of outcomes.
  • Coordinating with sales and stakeholders on project sizing, feasibility, and strategic opportunities.
  • Driving continuous improvement initiatives to advance DevOps maturity and AI infrastructure operational readiness.
  • Qualifications

  • 7+ years of professional experience in DevOps, cloud engineering, MLOps, or platform engineering.
  • 2+ years of experience in engineering leadership or senior technical leadership roles.
  • Expert proficiency with distributed cloud systems, Kubernetes, and infrastructure-as-code.
  • Advanced ability to troubleshoot infrastructure, networking, container, and deployment issues.
  • Proficiency in Python, Bash, or similar automation and scripting languages.
  • Strong understanding of monitoring, observability, and reliability engineering patterns.
  • Hands-on experience supporting infrastructure for ML or generative AI workloads.
  • Strong leadership, communication, and cross-functional collaboration skills.
  • Preferred Qualifications

  • Bachelor's degree in computer science, engineering, cloud computing, or a related field.
  • Master's degree in technical discipline.
  • Cloud and AI certifications, including Azure (AZ-900, AZ-104, AZ-305, AZ-700, AZ-800, AI-102) or equivalent AWS / GCP certifications.
  • Extensive experience with Kubernetes platforms (EKS, AKS, GKE) and cloud ML services (Azure ML, SageMaker).
  • Experience with GPU workload orchestration, optimization, and multi-tenant inference environments.
  • Expertise in observability and distributed tracing (Prometheus, Grafana, CloudWatch, OpenTelemetry).
  • Strong experience with Terraform and infrastructure governance at scale.
  • Familiarity with service mesh architectures (Istio, Linkerd) and advanced deployment patterns (blue / green, canary).
  • Advanced experience supporting generative AI platforms, including LLM inference runtimes (vLLM, TGI), RAG infrastructure, and vector databases (Pinecone, Weaviate, FAISS).
  • Experience operating fine-tuned LLMs (LoRA, QLoRA), managing GenAI CI / CD pipelines, and implementing hallucination, drift, and reliability monitoring.
  • Demonstrated ability to make strategic technical decisions within defined delivery and budget constraints.
  • Create a job alert for this search

    AI DevOps and Cloud Infrastructure Engineer • Los Angeles, CA, US

    Similar jobs
    Senior DevOps Engineer (AWS) - Contract

    Senior DevOps Engineer (AWS) - Contract

    Tech Holding • Los Angeles, CA, US
    Full-time
    Working at Tech Holding isn't just a job, it's an opportunity to be a part of something bigger.We are a full-service consulting firm that was founded on the premise of delivering predictabl...Show more
    Last updated: 1 day ago • Promoted
    Lead AI Infrastructure & Tooling Engineer

    Lead AI Infrastructure & Tooling Engineer

    The Walt Disney Company (France) • Santa Monica, CA, United States
    Full-time
    A major multimedia and entertainment corporation is seeking a Lead Software Engineer to innovate and enhance digital products across platforms. You will lead a team in developing scalable software f...Show more
    Last updated: 1 day ago • Promoted
    DevOps Engineer (Contract)

    DevOps Engineer (Contract)

    Fuser • Los Angeles, CA, US
    Full-time
    We're building Fuser — an AI-native creative platform for professionals.We've shipped a public beta with a solid foundation : multiplayer infrastructure, a node-based canvas, 50+ AI in...Show more
    Last updated: 30+ days ago • Promoted
    Databricks Data Engineer with DevOps

    Databricks Data Engineer with DevOps

    Apptad Inc • Los Angeles, CA, United States
    Full-time
    Quick Apply
    Job Description : We are looking for an experienced Databricks Data Engineer with strong DevOps expertis...Show more
    Last updated: 1 day ago
    Staff Software Engineer - Cloud Platform

    Staff Software Engineer - Cloud Platform

    Northwoodspace • Los Angeles, California, United States
    Full-time +1
    Northwood is on a mission to transform connectivity between earth and space and bring the benefits of space to the masses through innovations in space communications technologies.If you are energiz...Show more
    Last updated: 30+ days ago • Promoted
    DevOps Engineer

    DevOps Engineer

    Arixa Capital • Los Angeles, CA, US
    Full-time
    Arixa Capital is a leading private real estate lender and alternative investment manager with over $6 billion in originations completed since inception and a servicing portfolio exceeding $2 billio...Show more
    Last updated: 8 days ago • Promoted
    Azure Engineer, Infrastructure Operations

    Azure Engineer, Infrastructure Operations

    Human Rights Watch • Los Angeles, CA, United States
    Full-time +1
    FIXED-TERM FULL TIME JOB VACANCY.Azure Engineer, Infrastructure Operations.Information Technology Operations Division.Multiple Office Locations Considered. Application Date : January 18, 2026.The Inf...Show more
    Last updated: 24 days ago • Promoted
    Lead AI Infrastructure & Tooling Engineer

    Lead AI Infrastructure & Tooling Engineer

    The Walt Disney Company • Santa Monica, CA, United States
    Full-time
    A prominent entertainment firm is seeking a Lead Software Engineer to develop AI / ML frameworks and contribute to innovative tooling across products. This role involves mentorship of junior developer...Show more
    Last updated: 30+ days ago • Promoted
    Senior Solutions Engineer — Cloud & DevOps Architect

    Senior Solutions Engineer — Cloud & DevOps Architect

    IBM • Santa Monica, CA, United States
    Full-time
    A leading technology company is seeking a Sr.Solutions Engineer to transform customer challenges using HashiCorp offerings. This role involves acting as a trusted advisor, guiding customers through ...Show more
    Last updated: 30+ days ago • Promoted
    Cloud Engineer (AWS)

    Cloud Engineer (AWS)

    Contact Government Services, LLC • Los Angeles, CA, US
    Full-time
    Employment Type : Full-Time, Experienced .Department : Information technology .We are seeking a Cloud Engineer (AWS) who will be responsible for supporting the development of a...Show more
    Last updated: 30+ days ago • Promoted
    Senior Fullstack Engineer Agentic Infrastructure Job ID 2025-9881

    Senior Fullstack Engineer Agentic Infrastructure Job ID 2025-9881

    Internet Brands • El Segundo, California, United States
    Full-time
    Senior Fullstack Engineer : Agentic Infrastructure Team.Internet Brands is looking for a Senior Fullstack Engineer to join our Agentic AI team. You'll help build our AI capabilities from the ground u...Show more
    Last updated: 30+ days ago • Promoted
    DevOps Engineer

    DevOps Engineer

    ThinKom Solutions • Hawthorne, CA, US
    Full-time
    Thinkom Solutions is a leader in innovative, cutting-edge antenna technology, enabling global connectivity for commercial and military applications. We are seeking a talented and motivated DevOps En...Show more
    Last updated: 30+ days ago • Promoted
    Hybrid Cloud Solutions Architect - AWS, API & Data

    Hybrid Cloud Solutions Architect - AWS, API & Data

    Sharp Decisions • Torrance, CA, United States
    Full-time
    A client company in technology solutions is seeking an Enterprise Solutions Architect for a hybrid role based in Torrance, CA. This position involves driving alignment between business and IT, desig...Show more
    Last updated: 11 days ago • Promoted
    Sr. Systems Analyst

    Sr. Systems Analyst

    Mason • Sylmar, CA, US
    Full-time
    Systems Analyst Title : Sr.Systems Analyst Job Family : IT ...Show more
    Last updated: 1 day ago • Promoted
    Cloud Engineer

    Cloud Engineer

    GPL Technologies • Los Angeles, CA, US
    Full-time
    Quick Apply
    Cloud Engineer - Media & Entertainmen Workflows At GPL Technologies, our mission is to provide powerful, reliable, and innovative technology services and leadership to creative companies ...Show more
    Last updated: 30+ days ago
    Cloud Systems Architect

    Cloud Systems Architect

    Contact Government Services, LLC • Los Angeles, CA, US
    Full-time
    Employment Type : Full-Time, Mid-level.Department : Information Technology.We are seeking a Cloud Systems Architect with experience with cloud infrastructure to engineer and support public, private a...Show more
    Last updated: 30+ days ago • Promoted
    Senior DevOps Engineer (New York City, Los Angeles, or San Francisco)

    Senior DevOps Engineer (New York City, Los Angeles, or San Francisco)

    Regard • Los Angeles, CA, US
    Full-time
    As a Senior DevOps Engineer at Regard, you’ll scale and secure the infrastructure behind our healthcare AI platform.You’ll design, build, and own systems that make our applications reli...Show more
    Last updated: 30+ days ago • Promoted
    Sr. DevOps Engineer

    Sr. DevOps Engineer

    Astrolab • Hawthorne, CA, US
    Full-time
    Astrolab) is pioneering new ways to explore and operate on distant planetary bodies.We are singularly focused on designing, building, and operating a fleet of multi-purpose commercial planetary rov...Show more
    Last updated: 2 days ago • Promoted