Talent.com
Principal AI Infrastructure Abstraction Engineer

Principal AI Infrastructure Abstraction Engineer

Cisco Systems, Inc.San Jose, CA, United States
4 hours ago
Job type
  • Full-time
Job description

This position requires a hybrid working schedule in the San Jose or Milpitas office.

Meet the Team

We are an innovation team on a mission to transform how enterprises harness AI. Operating with the agility of a startup and the focus of an incubator, we're building a tight-knit group of AI and infrastructure experts driven by bold ideas and a shared goal : to rethink systems from the ground up and deliver breakthrough solutions that redefine what's possible - faster, leaner, and smarter.

We thrive in a fast-paced, experimentation-rich environment where new technologies aren't just welcome - they're expected. Here, you'll work side-by-side with seasoned engineers, architects, and thinkers to craft the kind of iconic products that can reshape industries and unlock entirely new models of operation for the enterprise.

If you're energized by the challenge of solving hard problems, love working at the edge of what's possible, and want to help shape the future of AI infrastructure - we'd love to meet you.

Your Impact

As an AI Infrastructure Abstraction Engineer , you will help shape the next generation of AI compute platforms by designing systems that abstract away hardware complexity and expose logical, scalable, and secure interfaces for AI workloads. Your work will enable multi-tenancy, resource isolation, and dynamic scheduling of GPUs and accelerators at scale - making infrastructure programmable, elastic, and developer-friendly.

You will bridge the gap between raw compute resources and AI / ML frameworks, allowing infrastructure teams and model developers to consume shared GPU resources with the performance and reliability of bare metal, but with the flexibility of cloud-native systems. Your contributions will empower internal and external users to run AI workloads securely, efficiently, and predictably - regardless of the underlying hardware topology.

This role is critical to enabling AI infrastructure that is multi-tenant by design, scalable in practice, and abstracted for portability across diverse platforms.

KEY RESPONSIBILITIES

  • Design and implement infrastructure abstractions that cleanly separate logical compute units (vGPUs, GPU pods, AI queues) from physical hardware (nodes, devices, interconnects) .
  • Develop runtime services, APIs, and control planes to expose GPU and accelerator resources to users and frameworks with multi-tenant isolation and QoS guarantees .
  • Architect systems for secure GPU sharing , including time-slicing, memory partitioning, and namespace isolation across tenants or jobs.
  • Collaborate with platform, orchestration, and scheduling teams to map logical resources to physical devices based on utilization, priority, and topology.
  • Define and enforce resource usage policies , including fair sharing, quota management, and oversubscription strategies.
  • Integrate with model training and serving frameworks (e.g., PyTorch, TensorFlow, Triton) to ensure smooth and predictable resource consumption.
  • Build observability and telemetry pipelines to trace logical-to-physical mappings, usage patterns, and performance anomalies.
  • Partner with infrastructure security teams to ensure secure onboarding, access control, and workload isolation in shared environments.
  • Support internal developers in adopting abstraction APIs, ensuring high performance while abstracting away low-level details.
  • Contribute to the evolution of internal compute platform architecture, with a focus on abstraction, modularity, and scalability.

Minimum Qualifications :

  • Bachelors + 15 years of related experience, or Masters + 12 years of related experience, or PhD + 8 years of related experience
  • Experience building scalable, production-grade infrastructure components or control planes using Go, Python, and C++ ,
  • Experience with Kubernetes, Docker or Kubevirt for v irtualization, containerization , and orchestration frameworks
  • Experience designing or implementing logical resource abstractions for compute, storage, or networking with a focus in multi-tenant environments .
  • Experience integrating with AI / ML platforms or pipelines (e.g., PyTorch, TensorFlow, Triton Inference Server, MLFlow).
  • Preferred Qualifications :

  • Experience with GPU sharing, scheduling, or isolation techniques (e.g., MPS, MIG, time-slicing, device plugin frameworks, or vGPU technologies).
  • Solid grasp of resource management concepts including quotas, fairness, prioritization, and elasticity.
  • #WeAreCisco

    #WeAreCisco where every individual brings their unique skills and perspectives together to pursue our purpose of powering an inclusive future for all.

    Our passion is connection-we celebrate our employees' diverse set of backgrounds and focus on unlocking potential. Cisconians often experience one company, many careers where learning and development are encouraged and supported at every stage. Our technology, tools, and culture pioneered hybrid work trends, allowing all to not only give their best, but be their best.

    We understand our outstanding opportunity to bring communities together and at the heart of that is our people. One-third of Cisconians collaborate in our 30 employee resource organizations, called Inclusive Communities, to connect, foster belonging, learn to be informed allies, and make a difference. Dedicated paid time off to volunteer-80 hours each year-allows us to give back to causes we are passionate about, and nearly 86% do!

    Our purpose, driven by our people, is what makes us the worldwide leader in technology that powers the internet. Helping our customers reimagine their applications, secure their enterprise, transform their infrastructure, and meet their sustainability goals is what we do best. We ensure that every step we take is a step towards a more inclusive future for all. Take your next step and be you, with us!

    Create a job alert for this search

    Principal Engineer Ai • San Jose, CA, United States

    Related jobs
    • Promoted
    Principal AWS Architect - Generative AI, LLMs & Knowledge Bases

    Principal AWS Architect - Generative AI, LLMs & Knowledge Bases

    Mogi I / O : OTT / Podcast / Short Video Apps for youSan Francisco, CA, United States
    Full-time
    Overview A leading global technology and consulting firm is seeking an AWS Bedrock Architect to design, implement, and scale Generative AI and Retrieval-Augmented Generation (RAG) architectures lev...Show moreLast updated: 1 day ago
    • Promoted
    Principal Data Infrastructure Engineer

    Principal Data Infrastructure Engineer

    fabric IncSan Francisco, CA, United States
    Full-time
    We’re a team of dedicated experts creating a new way to commerce for the age of AI Shopping.AI Commerce Operating System to orchestrate, optimize, and scale unified commerce for everyone.It’s a sys...Show moreLast updated: 5 days ago
    • Promoted
    Principal Cloud Architect

    Principal Cloud Architect

    Elios TalentSan Francisco, CA, United States
    Full-time
    Lead enterprise-wide cloud adoption and transformation initiatives.Architect secure, scalable, and multi-region cloud solutions. Drive cost optimization and long-term cloud strategy.We are seeking a...Show moreLast updated: 13 days ago
    • Promoted
    Principal Software Engineer – AI Systems

    Principal Software Engineer – AI Systems

    WalmartSunnyvale, CA, United States
    Full-time
    Design and implement large-scale, production-grade AI systems that integrate LLMs and Generative AI into real-world applications. Build frameworks that support Retrieval-Augmented Generation (RAG), ...Show moreLast updated: 3 days ago
    • Promoted
    Principal Platform Architect, Agentic AI

    Principal Platform Architect, Agentic AI

    NVIDIA CorporationSanta Clara, CA, United States
    Full-time
    NVIDIA has been transforming accelerated computing with innovation that’s fueled by great technology—and amazing people.As part of Nvidia's applied AI team for chip design, you will have the opport...Show moreLast updated: 28 days ago
    • Promoted
    Principal Capacity Engineer, Compute

    Principal Capacity Engineer, Compute

    AnthropicSan Francisco, CA, United States
    Full-time
    Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...Show moreLast updated: 30+ days ago
    • Promoted
    Distinguished AI Engineer (Agentic AI Platform Infrastructure)

    Distinguished AI Engineer (Agentic AI Platform Infrastructure)

    Capital OneSan Francisco, CA, United States
    Part-time
    You love to build systems, take pride in the quality of your work, and also share our passion to do the right thing.You want to work on problems that will help change banking for good.Passion for s...Show moreLast updated: 13 days ago
    • Promoted
    Principal Cloud Architect – AWS Bedrock & LLM Solutions

    Principal Cloud Architect – AWS Bedrock & LLM Solutions

    Mogi I / O : OTT / Podcast / Short Video Apps for youSan Francisco, CA, United States
    Full-time
    A leading global technology and consulting firm is seeking an AWS Bedrock Architect to design, implement, and scale Generative AI and Retrieval-Augmented Generation (RAG) based architectures levera...Show moreLast updated: 2 days ago
    • Promoted
    Principal AI UI Architect

    Principal AI UI Architect

    ZipRecruiterSan Mateo, CA, United States
    Full-time
    We are seeking a Principal AI UI Architect to define and lead the frontend architecture and developer experience for our AI Agentic Platform. This platform enables developers and enterprises to crea...Show moreLast updated: 28 days ago
    • Promoted
    Principal AI Engineer, Intelligent Sensors

    Principal AI Engineer, Intelligent Sensors

    1010 Analog Devices Inc.Rio Robles, CA, United States
    Full-time +1
    NASDAQ : ADI ) is a global semiconductor leader that bridges the physical and digital worlds to enable breakthroughs at the Intelligent Edge. ADI combines analog, digital, and software technologie...Show moreLast updated: 25 days ago
    • Promoted
    Principal Core Infrastructure Engineer

    Principal Core Infrastructure Engineer

    HighnoteSan Francisco, CA, United States
    Full-time
    Senior Core Infrastructure Engineer.Be among the first 25 applicants.Senior Core Infrastructure Engineer.Founded in 2020 by a team of leaders from Braintree, PayPal, and Lending Club, Highnote is a...Show moreLast updated: 30+ days ago
    • Promoted
    Principal AI Solution Architect

    Principal AI Solution Architect

    McKinsey & CompanySan Francisco, CA, United States
    Full-time
    Principal AI Solution Architect — Job ID : 102466.As a Principal AI Solution Architect at McKinsey, you will combine deep expertise in full-stack engineering and advanced AI systems to deliver trans...Show moreLast updated: 2 days ago
    • Promoted
    Principal AI Architect

    Principal AI Architect

    IntappPalo Alto, CA, United States
    Full-time
    Intapp’s Intelligent Cloud platform.This executive-level, hands-on role is critical to ensuring our technology ecosystem is scalable, integrated, and AI-enabled. You’ll collaborate across engineerin...Show moreLast updated: 30+ days ago
    • Promoted
    AI Infrastructure Engineer

    AI Infrastructure Engineer

    LanceDBSan Francisco, CA, United States
    Full-time
    From hyper-scalable vector search to advanced retrieval for RAG, from streaming training data to interactive exploration of large-scale AI datasets, LanceDB is the best foundation for your AI appli...Show moreLast updated: 11 days ago
    • Promoted
    Principal AWS Architect – Generative AI, LLMs & Knowledge Bases

    Principal AWS Architect – Generative AI, LLMs & Knowledge Bases

    Mogi I / O : OTT / Podcast / Short Video Apps for youSan Francisco, CA, United States
    Full-time
    A leading global technology and consulting firm is seeking an AWS Bedrock Architect to design, implement, and scale Generative AI and Retrieval-Augmented Generation (RAG) architectures leveraging A...Show moreLast updated: 23 hours ago
    • Promoted
    Principal Data Infrastructure Engineer

    Principal Data Infrastructure Engineer

    fabricSan Francisco, CA, United States
    Full-time
    Principal Data Infrastructure Engineer.We’re a team of dedicated experts creating a new way to commerce for the age of AI Shopping. AI Commerce Operating System to orchestrate, optimize, and scale u...Show moreLast updated: 30+ days ago
    • Promoted
    Principal Cloud Architect

    Principal Cloud Architect

    Swift NavigationSan Francisco, CA, United States
    Full-time
    GNSS and have been awarded programs for a total of more than.ADAS-enabled and autonomous vehicles.GNSS corrections service that allows for centimeter-level navigation safely.Skylark consists of a g...Show moreLast updated: 30+ days ago
    • Promoted
    Principal Software Engineer – AI Systems

    Principal Software Engineer – AI Systems

    Walmart CanadaSunnyvale, CA, United States
    Full-time
    Balance functional requirements with non-functional goals such as reliability, latency, and security.Generative AI / LLMs • • in production. Strong coding skills in • •Python (preferred) • • and at least o...Show moreLast updated: 3 days ago