Talent.com
Senior Software Engineer, Cloud-Native Stack CSP Engagements
Senior Software Engineer, Cloud-Native Stack CSP EngagementsNVIDIA • Santa Clara, CA, United States
Senior Software Engineer, Cloud-Native Stack CSP Engagements

Senior Software Engineer, Cloud-Native Stack CSP Engagements

NVIDIA • Santa Clara, CA, United States
1 day ago
Job type
  • Full-time
Job description

We are developing advanced multi-rack, multi-tenant AI / ML datacenters with NVIDIA GB200, and upcoming GB300 GPUs. NVIDIA seeks a Senior Software Engineer for our CSP (Cloud Service Provider) Engagements team to focus on the cloud-native stack for datacenter products like GB200. In this role, You will define customer workflows, prototype stack enhancements, and debug the toughest Kubernetes + Slurm issues in multi-rack, multi-tenant AI datacenters. You'll tackle complex scheduling challenges across racks, tenants, and clouds as part of the CSP engagements team.

What youll be doing :

Perform deep-dive debugging of multi-rack, multi-tenant clusters : scheduler behavior, container runtime issues, device-plugin crashes, RDMA / IB fabric anomalies, etc.

Gather customer requirements and prototype feature extensions for Kubernetes operators, Slurm plugins, and custom micro-services that expose new GPU capabilities.

Drive joint architecture reviews and whiteboard sessions with CSP and internal platform teams; convert findings into RFCs and upstream pull requests.

Create reproducible testbeds (Helm / Ansible / Terraform) that mirror customer environments; automate validation and benchmark suites.

Deliver technical collateral-design docs, how-to guides, demo scripts-and present at customer on-sites, KubeCon, and SlurmUG.

Collaborate with AE, FAE, and Solution Architect teams to deliver integrated customer solutions and technical documentation.

What we need to see :

Strong source-level expertise in Kubernetes internals (scheduler, CRI / CNI / CSI, operators) and Slurm (federation, power-save, plugins).

Hands-on experience integrating next-gen GPUs (Blackwell / GB200 / GB300) or comparable accelerators into containerized clusters.

Proven track record debugging large-scale, cloud-native stacks across networking (RDMA / RoCE), storage, and control planes.

Customer-facing engineering or solutions-architect background : requirements gathering, PoC ownership, roadmap influence.

Familiarity with CI / CD (GitHub Actions, Tekton), observability (Prometheus, OpenTelemetry), and infrastructure-as-code.

Excellent communication-able to switch between deep technical detail and high-level business impact.

6+ years of professional software development experience in distributed systems (Go, Rust, C / C++ or Python for tooling).

BS or MS (or equivalent experience) in Computer Engineering, Computer Science, or related field.

Ways to stand out from the crowd :

Upstream contributions to Kubernetes, Slurm, Volcano, or similar projects.

Experience with GPU computing (CUDA), deep learning workloads

NVIDIA is widely considered to be one of the technology worlds most desirable employers. We have some of the most forward-thinking and hard-working people in the world working for us. NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative, hardworking and self-motivated, we want to hear from you!

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until August 2, 2025. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

#J-18808-Ljbffr

Create a job alert for this search

Senior Software Engineer • Santa Clara, CA, United States

Related jobs
Senior Software Engineer, Cloud-Native Stack - CSP Engagements

Senior Software Engineer, Cloud-Native Stack - CSP Engagements

NVIDIA • Santa Clara, CA, United States
Full-time
We are developing advanced multi-rack, multi-tenant AI / ML datacenters with NVIDIA GB200, and upcoming GB300 GPUs.NVIDIA seeks a Senior Software Engineer for our CSP (Cloud Service Provider) Engagem...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer, Applied Analytics

Senior Software Engineer, Applied Analytics

General Motors of Canada • Mountain View, CA, United States
Full-time
Senior Software Engineer, Applied Analytics.This role is categorized as hybrid.This means the successful candidate is expected to report to the Technical / Innovation Center in Warren (MI), Austin (T...Show more
Last updated: 1 day ago • Promoted
Senior Site Reliability Engineer (Cloud Infra)

Senior Site Reliability Engineer (Cloud Infra)

Mumba Technologies, Inc. • Palo Alto, CA, United States
Full-time
We are seeking a highly skilled.Senior Site Reliability Engineer.In this role responsibilities will include designing and implementing infrastructure automation, continuous integration and delivery...Show more
Last updated: 22 days ago • Promoted
Senior Cloud Software Engineer & Tech Lead (Hybrid)

Senior Cloud Software Engineer & Tech Lead (Hybrid)

NetApp, Inc. • San Jose, CA, United States
Full-time
A leading technology firm is seeking a Senior Software Engineer to develop cloud-based solutions and lead feature delivery. The role requires strong coding skills in languages like C, C++, Python, o...Show more
Last updated: 18 hours ago • Promoted • New!
Senior Software Engineer - Transparency - iCloud

Senior Software Engineer - Transparency - iCloud

Apple • Cupertino, CA, United States
Full-time
Ever wondered how to prove your conversations are secure or how to build trust in the privacy claims of a cloud service? Transparency service provides secure identity and verifiable transparency th...Show more
Last updated: 3 days ago • Promoted
Senior Full-Stack Engineer - Hybrid, Cloud & Microservices

Senior Full-Stack Engineer - Hybrid, Cloud & Microservices

GEICO • Palo Alto, CA, US
Full-time
A leading insurance company is seeking a Senior Software Engineer to drive innovation and transform IT practices.This role involves developing scalable systems and mentoring junior engineers.The id...Show more
Last updated: 11 hours ago • Promoted • New!
Senior Full Stack Engineer - Cloud-Native Observability Platform

Senior Full Stack Engineer - Cloud-Native Observability Platform

Cisco Systems, Inc. • Milpitas, CA, United States
Full-time
Senior Full Stack Engineer - Cloud-Native Observability Platform.We are the Catalyst Center Platforms and Capabilities team, responsible for delivering scalable, secure, and high-productivity cloud...Show more
Last updated: 3 days ago • Promoted
Sr Software Engineer, Distributed Cloud

Sr Software Engineer, Distributed Cloud

F5 • San Jose, CA, United States
Full-time
At F5, we strive to bring a better digital world to life.Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer, Cloud Infrastructure

Senior Software Engineer, Cloud Infrastructure

Nuro • Mountain View, CA, United States
Full-time
Senior Software Engineer, Cloud Infrastructure.Nuro is a self-driving technology company on a mission to make autonomy accessible to all. Founded in 2016, Nuro is building the world's most scalable ...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer, Applied Analytics

Senior Software Engineer, Applied Analytics

General Motors • Mountain View, CA, United States
Full-time
Senior Software Engineer, Applied Analytics.This role is categorized as hybrid.This means the successful candidate is expected to report to the Technical / Innovation Center in Warren (MI), Austin (T...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer

Senior Software Engineer

23andMe • Palo Alto, California, USA
Full-time
Frontend-focused Senior Full Stack Engineer.You will own the platform architecture and mentor a team of junior engineers and contractors to deliver high-performance pixel-perfect user experiences....Show more
Last updated: 16 days ago • Promoted
Senior Software Engineer, Cloud Platform

Senior Software Engineer, Cloud Platform

Verily Life Sciences • Mountain View, CA, United States
Full-time
Verily is a subsidiary of Alphabet that is using a data-driven approach to change the way people manage their health and the way healthcare is delivered. Launched from Google X in 2015, our purpose ...Show more
Last updated: 30+ days ago • Promoted
Senior Cloud Platform Engineer

Senior Cloud Platform Engineer

SambaNova Systems • Palo Alto, CA, United States
Full-time
The era of pervasive AI has arrived.In this era, organizations will use generative AI to unlock hidden value in their data, accelerate processes, reduce costs, drive efficiency and innovation to fu...Show more
Last updated: 3 days ago • Promoted
Senior Software Engineer, Cloud Platform

Senior Software Engineer, Cloud Platform

Verily • Mountain View, CA, United States
Full-time
Senior Software Engineer, Cloud Platform page is loaded## Senior Software Engineer, Cloud Platformremote type : Hybridlocations : Mountain View, Californiatime type : Full timeposted on : Posted Yester...Show more
Last updated: 1 day ago • Promoted
Senior Software Engineer, Cloud Services

Senior Software Engineer, Cloud Services

Roku • San Jose, CA, United States
Full-time
Teamwork makes the stream work.Roku is changing how the world watches TV.Roku is the #1 TV streaming platform in the U.Canada, and Mexico, and we've set our sights on powering every television in t...Show more
Last updated: 3 days ago • Promoted
Senior Software Engineer, Full Stack - USDS

Senior Software Engineer, Full Stack - USDS

Tik Tok • San Jose, CA, United States
Full-time
We are seeking a Senior Software Engineer to be a core member of our AI-native team.In this role, you will leverage AI tools and agent-based solutions to fast-track our work, specifically focusing ...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer, Full-Stack

Senior Software Engineer, Full-Stack

Owner • San Jose, CA, United States
Full-time
Senior Software Engineer, Full-Stack.Owner is the all-in-one platform that restaurants use to succeed online.Thousands of restaurant owners use our tools to build their website, drive online orders...Show more
Last updated: 1 day ago • Promoted
Senior Software Engineer, Hyperscale

Senior Software Engineer, Hyperscale

Pure Storage • Santa Clara, CA, United States
Full-time
We're in an unbelievably exciting area of tech and are fundamentally reshaping the data storage industry.Here, you lead with innovative thinking, grow along with us, and join the smartest team in t...Show more
Last updated: 3 days ago • Promoted