Talent.com
Distinguished Software Architect - Deep Learning and HPC Communications

Distinguished Software Architect - Deep Learning and HPC Communications

NVIDIASanta Clara, CA, United States
1 day ago
Job type
  • Full-time
Job description

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars.

We are the GPU Communications Libraries and Networking team at NVIDIA. We deliver communication libraries like NCCL, NVSHMEM, UCX for Deep Learning and HPC. We are looking for a Distinguished Software Architect to help co-design our next generation data center platforms. DL and HPC applications have a huge compute demand already and run on scales which go up to tens of thousands of GPUs. The GPUs are connected with high-speed interconnects (eg. NVLink, PCIe) within a node and with high-speed networking (eg. Infiniband, Ethernet) across the nodes. Communication performance between the GPUs has a direct impact on the end-to-end application performance; and the stakes are even higher at huge scales! This is an outstanding opportunity to push the limits on the state-of-the-art and deliver platforms the world has never seen before. Are you ready to contribute to the development of innovative technologies and help realize NVIDIA's vision?

What you will be doing :

Research new communication technologies (e.g. expand the GPUDirect technology portfolio) and design new features for our communication libraries

Propose innovative solutions in HW and SW for our next-gen platforms. You will co-design these solutions with the GPU, Networking, and SW architects and ensure seamless integration with the software stacks

Inspire changes based on quantitative data coming from proof-of-concepts or detailed technical analysis / modeling

Drive the adoption of new communication technologies across application verticals

Keep up with the latest DL research and collaborate with diverse teams (internal and external), including DL researchers, and customers

What we need to see :

PHD in Computer Science, Computer Engineering or related field or strong equivalent experience; 15+ years of relevant experience in academia or the industry

Expert in following areas : HPC, parallel programming models (MPI, SHMEM), at least one communication runtime (MPI, NCCL, NVSHMEM, OpenSHMEM, UCX, UCC), computer and system architecture, GPU architecture and CUDA

Deep understanding of various aspects of high performance networking from prior work experience : network technologies (Infiniband, Ethernet), network design, network topologies, network debug and performance analysis

Strong in at least a few of these areas : ML / DL fundamentals and how they tie to communications, parallel algorithms, fault tolerance and resiliency, competitive assessments, performance analysis and optimizations for parallel applications on large clusters, developing applications using DL Frameworks (PyTorch, TensorFlow)

Programming fluency with C or C++ for systems software development

Flexibility to work and communicate effectively across different HW / SW teams and timezones

Ways to stand out from the crowd :

Industry recognized leader in HPC / DL communications with history of patents, publications and conference talks and keynotes in areas relevant to this role

Influential role in industry standards (e.g. MPI, OpenSHMEM) and open source software (e.g. PyTorch, UCX, Open MPI)

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative and autonomous, we want to hear from you!

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 308,000 USD - 471,500 USD.

You will also be eligible for equity and benefits () .

Applications for this job will be accepted at least until November 13, 2025.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Create a job alert for this search

Software Architect • Santa Clara, CA, United States

Related jobs
  • Promoted
Senior Software Architect, Advanced Development

Senior Software Architect, Advanced Development

NVIDIASanta Clara, CA, United States
Full-time
NVIDIA is looking for a creative Advanced Development Engineer to join the Software Architecture Team.In this role, you will shape the future of data centers, management and monitoring solutions.Yo...Show moreLast updated: 30+ days ago
  • Promoted
Senior Software Architect - Deep Learning and HPC Communications

Senior Software Architect - Deep Learning and HPC Communications

NVIDIASanta Clara, CA, United States
Full-time
NVIDIA is leading groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization.The GPU our invention serves as the visual cortex of modern computers an...Show moreLast updated: 1 day ago
  • Promoted
Senior Deep Learning Computer Architect

Senior Deep Learning Computer Architect

NVIDIASanta Clara, CA, United States
Full-time
We are now looking for a Senior Deep Learning Computer Architect! NVIDIA is seeking architects like you to help design hardware accelerator and processor architectures that enable state of the art ...Show moreLast updated: 1 day ago
  • Promoted
Software Architect

Software Architect

RIT Solutions, Inc.San Mateo, CA, United States
Full-time
Required / Desired / Job Description : .Software AI, ML, and Deep Learning initiatives for heart medical device product.Goal to take product to a production version. Keys to the role in terms of candi...Show moreLast updated: 1 day ago
  • Promoted
Senior Software Architect - Data Center Systems

Senior Software Architect - Data Center Systems

NVIDIASanta Clara, CA, United States
Full-time
We are building innovative server systems for GPU accelerated applications, such as Deep Learning.Data Center SW team architects and develops the end to end software and firmware stack for these sy...Show moreLast updated: 1 day ago
  • Promoted
Solutions Architect, Inference Deployments

Solutions Architect, Inference Deployments

NVIDIASanta Clara, CA, United States
Full-time
We’re forming a team of innovators to roll out and enhance AI inference solutions at scale, demonstrating NVIDIA’s GPU technology and Kubernetes. As a Solutions Architect (Inference Focus), you’ll c...Show moreLast updated: 30+ days ago
  • Promoted
Solutions Architect, Hyperscale

Solutions Architect, Hyperscale

NVIDIASanta Clara, CA, United States
Full-time
We are seeking driven, high- energy, engineers to join the Solutions Architecture team in building Artificial Intelligence (AI) solutions with the world’s largest customers.This dynamic role requir...Show moreLast updated: 30+ days ago
  • Promoted
Senior Solutions Architect, Generative AI Inference and Deployment

Senior Solutions Architect, Generative AI Inference and Deployment

NVIDIASanta Clara, CA, United States
Full-time
NVIDIA is seeking outstanding AI Solutions Architects to assist and support customers that are building solutions with our newest AI technology. At NVIDIA, our solutions architects work across diffe...Show moreLast updated: 30+ days ago
  • Promoted
Artificial Intelligence and Machine Learning Solutions Architect

Artificial Intelligence and Machine Learning Solutions Architect

CooleySan Francisco, CA, United States
Full-time
Cooley is looking for an innovative Artificial Intelligence and Machine Learning Solutions Architect to join our dynamic Practice Engineering team within the Innovation department.At Cooley, a pion...Show moreLast updated: 1 day ago
  • Promoted
Machine Learning Architect

Machine Learning Architect

MERUSan Francisco, CA, United States
Full-time
You will work directly with Venkat (the CTO) to optimize our agent-based retrieval.You will be expected to help with engineering across the stack, including model architecture research, latency opt...Show moreLast updated: 1 day ago
  • Promoted
Solution Architect - Artificial Intelligence & Machine Learning

Solution Architect - Artificial Intelligence & Machine Learning

TEPHRASan Francisco, CA, United States
Full-time
Role : Solution Architect - Artificial Intelligence & Machine Learning - Consumer Business Group.Location : San Francisco, CA (other US Locations can be considered but compensation mentioned is for B...Show moreLast updated: 30+ days ago
  • Promoted
Neural Engine Performance Architect, Platform Architecture

Neural Engine Performance Architect, Platform Architecture

AppleCupertino, CA, United States
Full-time
At Apple, Platform Architecture is responsible for connecting our hardware and software into one unified system.Join this team, and you'll collaborate with engineers across Apple to design how all ...Show moreLast updated: 1 day ago
  • Promoted
Senior Software Architect, AI and HPC

Senior Software Architect, AI and HPC

NVIDIASanta Clara, CA, United States
Full-time
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years.It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.T...Show moreLast updated: 1 day ago
  • Promoted
AI Architect, Distinguished Software Engineer

AI Architect, Distinguished Software Engineer

Disneyland Hong KongSan Francisco, CA, United States
Full-time
On any given day at Disney Entertainment & ESPN Technology, were reimagining ways to create magical viewing experiences for the worlds most beloved stories while also transforming Disneys media bus...Show moreLast updated: 1 day ago
  • Promoted
Senior Solutions Architect

Senior Solutions Architect

NVIDIASanta Clara, CA, United States
Full-time
NVIDIA is looking for an experienced network infrastructure Solutions Architect.Do you want to be part of a team that brings Artificial Intelligence (AI) hardware and software technologies to produ...Show moreLast updated: 30+ days ago
  • Promoted
System Architect, Simulations & Models

System Architect, Simulations & Models

PsiQuantumPalo Alto, CA, United States
Full-time
Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
  • Promoted
Senior Staff Machine Learning Engineer - Architect

Senior Staff Machine Learning Engineer - Architect

GoFundMeSan Francisco, CA, United States
Full-time
Senior Staff Machine Learning Engineer - Architect.GoFundMe is the world's most powerful community for good, dedicated to helping people help each other. By uniting individuals and nonprofits in one...Show moreLast updated: 1 day ago
  • Promoted
Senior Software Architect Networking

Senior Software Architect Networking

NVIDIASanta Clara, CA, United States
Full-time
NVIDIA Networking has been a leader in high performance networking infrastructure for many years.The next unit of computing is the datacenter, and the network makes it all possible! We are growing ...Show moreLast updated: 30+ days ago