Talent.com
Software Manager, AI Infrastructure System
Software Manager, AI Infrastructure SystemNVIDIA • US
No longer accepting applications
Software Manager, AI Infrastructure System

Software Manager, AI Infrastructure System

NVIDIA • US
30+ days ago
Job type
  • Full-time
Job description

NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 fueled the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI and enabled the next era of computing. NVIDIA is a “learning machine” that constantly evolves by adapting to new opportunities that are hard to address, that matters to the world, and that only we can address. This is our life’s work, to amplify human imagination and intelligence, and expand what is possible. We’re seeking strategic, bold, hard-working, and creative individuals who are passionate about helping us tackle challenges no one else can solve. Make the choice to join us today.

We are looking for a n AI Infrastructure System Software Manager to join our mission to continue improving our HPC infrastructure. Our team builds and operates sophisticated infrastructure to enable business critical services and AI applications. You will be working with a team of passionate and skilled engineers that are continuously working to provide better tools to build and manage this i nfras tru cture . Ideal candidate is strong in software development, designing and creating reliable distribute d system s, and has the abi lit y to imp leme n t well though t out lo ng term maintenance strategy.

What you'll be doing :

Mentor, grow, and develop a world-class team of AI infrastructure engineers.

Work across several teams and orgs to build products that use LLMs and agent systems to serve the needs of NVIDIA engineering teams. In that role, you will be collaborating with research and infra teams and serve a large user base (hardware / software teams across NVIDIA).

Align priorities across collaborators and define metrics for measuring the success of the product / team.

Develop and execute strategies for scalable, reliable, and secure AI infrastructure supporting both research and production workloads.

Ensure robust monitoring, logging, visualization, and alerting capabilities to guarantee promised uptime and operational excellence.

Architect, design, develop, and maintain infrastructure and large-scale applications for LLM-based solutions. Optimize these systems for performance, scalability, reliability, and secure data management.

Stay updated with the latest trends in AI, ML, and infrastructure, proactively seeking opportunities to integrate advancements into Nvidia’s LLM and AI infrastructure solutions.

What we need to see :

10+ overall years of industry large distributed system software development experience.

BS+ degree in CS or related / equivalent experience.

5+ years of experience managing of AI and SW development teams.

Familiarity with modern software development stacks and tools, including containerization, cloud or on-premises deployments, API integration for seamless model operation, and real-time processing frameworks.

Experience in developing and maintaining LLM or GenAI infrastructure

Excellent communication, collaboration and problem-solving skills, with a dedication to encouraging an inclusive and diverse workplace.

Hands-on experience developing large-scale distributed systems

Ways to stand out from the crowd :

Strong technical background in cloud / distributed infrastructure

Experience debugging functional and performance issues in HPC GPU clusters

Background in running and instrumenting distributed LLM training on a multi GPU HPC cluster

Experience with HPC schedulers such as Slurm

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 224,000 USD - 356,500 USD for Level 3, and 272,000 USD - 425,500 USD for Level 4.

You will also be eligible for equity and benefits .

Applications for this job will be accepted at least until July 29, 2025.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Create a job alert for this search

Software Infrastructure • US

Related jobs
Senior Infrastructure Engineer - AI / ML

Senior Infrastructure Engineer - AI / ML

Jobgether • US
Remote
Full-time
Quick Apply
This position is posted by Jobgether on behalf of a partner company.We are currently looking for a.Senior Infrastructure Engineer - AI / ML. This fully remote role offers the chance to design, impleme...Show more
Last updated: 8 days ago
Design Systems Manager

Design Systems Manager

Akaasa Technologies • United States
Full-time
Quick Apply
Design Systems Manager Remote, hybrid if the candidate resides in Miami, FL • •Include a portfolio • • &...Show more
Last updated: 7 days ago
CRNA - Anesthesiology job available in Parsons, Kansas

CRNA - Anesthesiology job available in Parsons, Kansas

Source Medical, LLC. • Parsons, KS, US
Full-time +1
CRNA opening in KansasLocated in Parsons, KS - Wichita 125m, Kansas City 150mFull-time, permanent positionSeeking BC / BECRNA needed to join busy team. Newly renovated surgery department.Surgery mix i...Show more
Last updated: 30+ days ago • Promoted
Moms Ultimate Job : Earn Up to $75,000+ from Home as a Surrogate!

Moms Ultimate Job : Earn Up to $75,000+ from Home as a Surrogate!

Physician's Surrogacy • Independence, KS, US
Full-time +2
Earn up to $75,000 as a stay-at-home mom or as a second job.Sign up today and earn $1,200 bonus.What qualities make you a great Surrogate?. Experienced no complications during your own pregnancy.Hea...Show more
Last updated: 2 days ago • Promoted
Assistant Parts Manager - Big Cabin, OK

Assistant Parts Manager - Big Cabin, OK

Bruckner Truck Sales, Inc • Big Cabin, OK, US
Full-time
Bruckner Truck & Equipment is one of the largest family-owned semi truck dealerships in the United States.We currently operate in 40+ locations across 12 states, and we have over 1,500 team mem...Show more
Last updated: 30+ days ago • Promoted
Data Collector

Data Collector

County of Montgomery • Independence, KS, US
Full-time
As a data collector, you will be responsible for gathering accurate and comprehensive data related to real property within Montgomery County, Kansas. Your role will be critical in supporting propert...Show more
Last updated: 24 days ago • Promoted
Manager, GTM Systems

Manager, GTM Systems

Keeper Security, Inc. • US
Remote
Full-time
Quick Apply
Keeper Security is hiring a highly skilled, operationally focused Manager, GTM Systems to lead the administration and optimization of our go-to-market technology stack. Keeper’s cybersecurity softwa...Show more
Last updated: 2 days ago
System Engineering

System Engineering

Axiom Software Solutions Limited • US
Remote
Full-time
Quick Apply
Automation of infrastructure delivery with configuration orchestration and management solutions (e.Understanding of coding practices and developing in a preferred scripting language (e.PowerShell, ...Show more
Last updated: 30+ days ago
ForgeRock Identity Manager Architect / Engineer

ForgeRock Identity Manager Architect / Engineer

Cloud Security Services • US
Full-time
Quick Apply
Hybrid Pathways is currently looking for an experienced ForgeRock Identity Management Engineer Lead for our client.Our client requires a ForgeRock Identity Management Engineer Lead to deploy ForgeR...Show more
Last updated: 30+ days ago
Ping Identity and Access Manager Architect (Remote)

Ping Identity and Access Manager Architect (Remote)

Cloud Security Services • (Multiple States), US
Remote
Full-time
Quick Apply
PING CERTIFICATION REQUIRED Job Title : .Ping Identity and Access Manager Architect (Remote) Location : Fully Remote Company : CTI About Us : CTI is a leading technology company specializing...Show more
Last updated: 30+ days ago
Part-Time Navy Physician

Part-Time Navy Physician

US Navy Reserve • Cherryvale, Kansas, US
Part-time +1
ABOUT Serve your country as a part-time physician.This position DOES NOT require you to relocate.Navy commissioned physicians attend to service members and their families in much the same way a civ...Show more
Last updated: 4 days ago • Promoted
AI Engineering Manager (Remote - US)

AI Engineering Manager (Remote - US)

Jobgether • US
Remote
Full-time
Quick Apply
This position is posted by Jobgether on behalf of a partner company.We are currently looking for an.We are seeking an experienced and mission-driven engineering leader to guide a team of software, ...Show more
Last updated: 10 days ago
Media Production Manager

Media Production Manager

L2 Realty, Inc • Independence, KS, US
Full-time
Media Production Manager .This role oversees the creation, management, and delivery of high-quality visual content designed to promote properties, agents, and enhance the brand image of our re...Show more
Last updated: 30+ days ago • Promoted
CRNA - Anesthesiology job available in Parsons, Kansas

CRNA - Anesthesiology job available in Parsons, Kansas

Archway Physician Recruitment • Parsons, KS, US
Full-time +1
Newly renovated surgery department.Surgery mix includes Orthopedics, Neurosurgery, General Surgery, OB / GYN, Ophthalmology, Urology and ENT. Hospital Employee - Full-time - Permanent - Work independe...Show more
Last updated: 30+ days ago • Promoted
Senior Manager, AI / ML Infra Engineering

Senior Manager, AI / ML Infra Engineering

IntelliPro Group Inc. • (Multiple States), US
Full-time +1
Quick Apply
Sr Manager, AI / ML Infra Engineering Position Type : Full-Time / Permanent Location : Remote (Must be based in Bay Area, Chicago, Boston for occasional in-person collaboration) Salary Rang...Show more
Last updated: 30+ days ago
Manager, Engineering (AI) (Remote - US)

Manager, Engineering (AI) (Remote - US)

Jobgether • US
Remote
Full-time
Quick Apply
This position is posted by Jobgether on behalf of a partner company.We are currently looking for a.This role offers the opportunity to lead a high-performing technical team focused on delivering in...Show more
Last updated: 4 days ago
Azure AI Architect

Azure AI Architect

SOMERSET STAFFING • United States
Full-time
Quick Apply
Job Description : Industry : Engineering Job Category : Information Technology - Enterprise Software Implementation Show more
Last updated: 7 days ago
AI Operations Manager (Remote - US)

AI Operations Manager (Remote - US)

Jobgether • US
Remote
Full-time
Quick Apply
This position is posted by Jobgether on behalf of a partner company.We are currently looking for an.This role offers the opportunity to lead and manage AI data projects for enterprise clients, ensu...Show more
Last updated: 2 days ago