Talent.com
Senior Software Engineer, Cloud-Native Stack CSP Engagements
Senior Software Engineer, Cloud-Native Stack CSP EngagementsNVIDIA • Santa Clara, CA, United States
Senior Software Engineer, Cloud-Native Stack CSP Engagements

Senior Software Engineer, Cloud-Native Stack CSP Engagements

NVIDIA • Santa Clara, CA, United States
3 days ago
Job type
  • Full-time
Job description

We are developing advanced multi-rack, multi-tenant AI / ML datacenters with NVIDIA GB200, and upcoming GB300 GPUs. NVIDIA seeks a Senior Software Engineer for our CSP (Cloud Service Provider) Engagements team to focus on the cloud-native stack for datacenter products like GB200. In this role, You will define customer workflows, prototype stack enhancements, and debug the toughest Kubernetes + Slurm issues in multi-rack, multi-tenant AI datacenters. You'll tackle complex scheduling challenges across racks, tenants, and clouds as part of the CSP engagements team.

What youll be doing :

Perform deep-dive debugging of multi-rack, multi-tenant clusters : scheduler behavior, container runtime issues, device-plugin crashes, RDMA / IB fabric anomalies, etc.

Gather customer requirements and prototype feature extensions for Kubernetes operators, Slurm plugins, and custom micro-services that expose new GPU capabilities.

Drive joint architecture reviews and whiteboard sessions with CSP and internal platform teams; convert findings into RFCs and upstream pull requests.

Create reproducible testbeds (Helm / Ansible / Terraform) that mirror customer environments; automate validation and benchmark suites.

Deliver technical collateral-design docs, how-to guides, demo scripts-and present at customer on-sites, KubeCon, and SlurmUG.

Collaborate with AE, FAE, and Solution Architect teams to deliver integrated customer solutions and technical documentation.

What we need to see :

Strong source-level expertise in Kubernetes internals (scheduler, CRI / CNI / CSI, operators) and Slurm (federation, power-save, plugins).

Hands-on experience integrating next-gen GPUs (Blackwell / GB200 / GB300) or comparable accelerators into containerized clusters.

Proven track record debugging large-scale, cloud-native stacks across networking (RDMA / RoCE), storage, and control planes.

Customer-facing engineering or solutions-architect background : requirements gathering, PoC ownership, roadmap influence.

Familiarity with CI / CD (GitHub Actions, Tekton), observability (Prometheus, OpenTelemetry), and infrastructure-as-code.

Excellent communication-able to switch between deep technical detail and high-level business impact.

6+ years of professional software development experience in distributed systems (Go, Rust, C / C++ or Python for tooling).

BS or MS (or equivalent experience) in Computer Engineering, Computer Science, or related field.

Ways to stand out from the crowd :

Upstream contributions to Kubernetes, Slurm, Volcano, or similar projects.

Experience with GPU computing (CUDA), deep learning workloads

NVIDIA is widely considered to be one of the technology worlds most desirable employers. We have some of the most forward-thinking and hard-working people in the world working for us. NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative, hardworking and self-motivated, we want to hear from you!

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until August 2, 2025. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

#J-18808-Ljbffr

Create a job alert for this search

Senior Software Engineer • Santa Clara, CA, United States

Related jobs
Sr. Software Engineer - Cloud Payments

Sr. Software Engineer - Cloud Payments

ELO Touch • Milpitas, CA, United States
Full-time
We know touch - it's our only business.In fact, we invented the touchscreen over 50 years ago and haven't stopped since.Every 21 seconds, a new Elo touch screen is installed somewhere in the world....Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer, Cloud-Native Stack - CSP Engagements

Senior Software Engineer, Cloud-Native Stack - CSP Engagements

NVIDIA • Santa Clara, CA, United States
Full-time
We are developing advanced multi-rack, multi-tenant AI / ML datacenters with NVIDIA GB200, and upcoming GB300 GPUs.NVIDIA seeks a Senior Software Engineer for our CSP (Cloud Service Provider) Engagem...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer Full-Stack

Senior Software Engineer Full-Stack

Kodiak • Mountain View, CA, United States
Full-time
The company has developed an artificial intelligence (AI) powered technology stack purpose-built for commercial trucking and the public sector. The company delivers freight daily for its customers a...Show more
Last updated: 30+ days ago • Promoted
Staff / Senior Software Engineer, Cloud Network

Staff / Senior Software Engineer, Cloud Network

Salesforce.Com Inc • Palo Alto, CA, United States
Full-time +1
To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts. Salesforce is the #1 AI CRM, where humans with age...Show more
Last updated: 4 days ago • Promoted
Senior Software Engineer AI Platforms

Senior Software Engineer AI Platforms

Cisco • San Jose, CA, United States
Full-time
The Cisco Security AI team delivers AI products and platform for all Cisco secure products and portfolios so businesses around the world defend against threats and safeguard the most vital aspects ...Show more
Last updated: 30+ days ago • Promoted
Senior Full Stack Engineer - Cloud-Native Observability Platform

Senior Full Stack Engineer - Cloud-Native Observability Platform

Cisco Systems, Inc. • Milpitas, CA, United States
Full-time
Senior Full Stack Engineer - Cloud-Native Observability Platform.We are the Catalyst Center Platforms and Capabilities team, responsible for delivering scalable, secure, and high-productivity cloud...Show more
Last updated: 4 days ago • Promoted
Sr Software Engineer, Distributed Cloud

Sr Software Engineer, Distributed Cloud

F5 • San Jose, CA, United States
Full-time
At F5, we strive to bring a better digital world to life.Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer, Cloud Infrastructure

Senior Software Engineer, Cloud Infrastructure

Nuro • Mountain View, CA, United States
Full-time
Senior Software Engineer, Cloud Infrastructure.Nuro is a self-driving technology company on a mission to make autonomy accessible to all. Founded in 2016, Nuro is building the world's most scalable ...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer, Applied Analytics

Senior Software Engineer, Applied Analytics

General Motors • Mountain View, CA, United States
Full-time
Senior Software Engineer, Applied Analytics.This role is categorized as hybrid.This means the successful candidate is expected to report to the Technical / Innovation Center in Warren (MI), Austin (T...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer, Cloud Platform & APIs

Senior Software Engineer, Cloud Platform & APIs

Broadcom • Palo Alto, CA, United States
Full-time
A leading technology company in Palo Alto, CA, is seeking a passionate Software Engineer to join their VCF Ops Management organization. The role involves building and supporting management solutions...Show more
Last updated: 14 hours ago • Promoted • New!
Senior Software Engineer - Observability (Databases)

Senior Software Engineer - Observability (Databases)

Databricks • Mountain View, CA, United States
Full-time
At Databricks, we are inspired by allowing data teams to solve the world's toughest problems, from security threat detection to cancer drug development. We do this by building and running the world'...Show more
Last updated: 5 days ago • Promoted
Senior Software Engineer, Cloud Platform

Senior Software Engineer, Cloud Platform

Verily Life Sciences • Mountain View, CA, United States
Full-time
Verily is a subsidiary of Alphabet that is using a data-driven approach to change the way people manage their health and the way healthcare is delivered. Launched from Google X in 2015, our purpose ...Show more
Last updated: 30+ days ago • Promoted
Senior Cloud Platform Engineer

Senior Cloud Platform Engineer

SambaNova Systems • Palo Alto, CA, United States
Full-time
The era of pervasive AI has arrived.In this era, organizations will use generative AI to unlock hidden value in their data, accelerate processes, reduce costs, drive efficiency and innovation to fu...Show more
Last updated: 5 days ago • Promoted
Senior Software Engineer, Cloud Platform

Senior Software Engineer, Cloud Platform

Verily • Mountain View, CA, United States
Full-time
Senior Software Engineer, Cloud Platform page is loaded## Senior Software Engineer, Cloud Platformremote type : Hybridlocations : Mountain View, Californiatime type : Full timeposted on : Posted Yester...Show more
Last updated: 3 days ago • Promoted
Senior Software Engineer, Cloud Services

Senior Software Engineer, Cloud Services

Roku • San Jose, CA, United States
Full-time
Teamwork makes the stream work.Roku is changing how the world watches TV.Roku is the #1 TV streaming platform in the U.Canada, and Mexico, and we've set our sights on powering every television in t...Show more
Last updated: 5 days ago • Promoted
Senior Software Engineer, Full Stack - USDS

Senior Software Engineer, Full Stack - USDS

Tik Tok • San Jose, CA, United States
Full-time
We are seeking a Senior Software Engineer to be a core member of our AI-native team.In this role, you will leverage AI tools and agent-based solutions to fast-track our work, specifically focusing ...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer, Hyperscale

Senior Software Engineer, Hyperscale

Pure Storage • Santa Clara, CA, United States
Full-time
We're in an unbelievably exciting area of tech and are fundamentally reshaping the data storage industry.Here, you lead with innovative thinking, grow along with us, and join the smartest team in t...Show more
Last updated: 5 days ago • Promoted
Senior Software Engineer (Full Stack)

Senior Software Engineer (Full Stack)

Oracle • Pleasanton, CA, United States
Full-time
Business Data Intelligence is one of the fastest growing segments of the software industry.Business Data Intelligence Tools allow analytics and executives to get the information they need to make c...Show more
Last updated: 30+ days ago • Promoted