Talent.com
Senior High Performance Computing Cluster Administrator
Senior High Performance Computing Cluster AdministratorNVIDIA • Remote, CA, US
Senior High Performance Computing Cluster Administrator

Senior High Performance Computing Cluster Administrator

NVIDIA • Remote, CA, US
30+ days ago
Job type
  • Full-time
  • Remote
Job description

NVIDIA's Deep Learning Optimized Frameworks Group is looking for a deeply technical HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains. As a member of the DLFW Infrastructure team, you will provide leadership in the design and implementation of groundbreaking GPU compute cluster that runs demanding deep learning, high performance computing, and computationally intensive workloads. We are looking for an expert to identify architectural changes and/or completely innovative approaches for our GPU Compute Cluster. In this role, you will help us with the strategic challenges we encounter, including compute, networking, and storage design for large-scale, high-performance workloads and effective resource utilization in a heterogeneous compute environment.

What you'll be doing:

  • Administer Linux systems, ranging from powerful DGX servers to embedded systems, bringup hardware to publicly available systems.

  • Coordinate Storage Solutions and plan for growth.

  • Automate configuration management, software updates, and maintenance and monitoring of system availability using modern DevOps tools (Ansible, Gitlab, etc.)

  • Actively connect with management regarding any problems with the equipment and propose resolution.

  • Plan, build and install/upgrade new systems that support NVIDIA DL Software

What we need to see:

  • You have a BA, BS, or MS in CS, EE, CE or equivalent experience

  • 4+ years of previous experience deploying and administrating HPC clusters

  • Familiar with resource scheduling managers (Slurm (preferred), LSF, etc!

  • Proven track record to script in bash, Perl or python

  • Experience with containers (Docker, Singularity, LXC)

  • Deep understanding of operating systems, computer networks, and high-performance applications

  • Ability to work well with developers & test engineers

  • Hard-working dedication to provide quality in support for your users

Ways to stand out from the crowd:

  • Familiarity and prior work experience with technologies such as: Ansible, GIT, Slurm, Zabbix, Prometheus, Grafana and Docker

  • Familiarity with GPU usage in Compute Cluster and Cuda

  • Experience with mobile and embedded systems

  • Basic knowledge of Deep Learning.

  • Experience coding/scripting in Perl/Python/bash

The base salary range is 148,000 USD - 230,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and . NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Create a job alert for this search

Senior High Performance Computing Cluster Administrator • Remote, CA, US

Similar jobs

Complete Online Surveys For Cash (Up to $25/per)

Earn HausFarmersville, CA, US
Full-time +1

Looking for people to participate in taking online surveys for Fortune 500 brands.All you need to do is complete online surveys by sharing your opinion.You will help influence brand decisions on se...Show more

 • Promoted

Manager, Work & Resource Management - Engineering (Location Flexible)

PG&E CorporationSHAVER LAKE, California, US
Full-time

Job Category: Engineering / Science .Business Unit: Strategy & Growth.Job Location: Oakland; Alameda; Alta; American Canyon; Angels Camp; Antioch; Auberry; Auburn; Avenal; Avila Beach; Bakersfield;...Show more

 • Promoted

GCP Cloud Engineer

Noblesoft TechnologiesCA, United States
Full-time
Quick Apply

Job Title: GCP Cloud Engineer with Java Location: St.Louis, MO (Remote Ok) Contract Mandatory skills: GCP Composer (Apache Airflow) ...Show more

Computer Systems Analyst (Journeyman)

Dynamic Solutions Technology LLCCA, USA
Full-time
Quick Apply

Dynamic Solutions Technology, LLC.IT and Service needs for commercial and government clients.This position is to provide support in the.Analyze science, engineering, business, and other data proces...Show more

Oracle SCM Cloud Lead

Conquest Tech Solutions IncCA, United States
Full-time +2
Quick Apply

Oracle Cloud SCM Lead 6+ Months Contract Bay Area CA On-site for 3 days /week SCM Lead US Preferably someone local to the Bay Area who can work in-office 3 days a week in Sa...Show more

Sr. Director Decision Support

Regal Medical GroupCA, United States
Full-time

This position will lead in the development of Decision Support reporting solutions at Regal Medical Group in order to support the Executive Management Team’s business objectives.Provides leadership...Show more

 • Promoted

Senior Software Engineer Oracle EPM Cloud

KKTechnologies LLCCA, United States
Full-time
Quick Apply

Job role: Senior Software Engineer Oracle EPM Cloud (W2 / 1099 Only) Locations: Alpharetta, GA | Oakland, CA | Rancho Cordova, CA Note: Local Candidates Only We are looking for a highly skilled Sen...Show more

Oracle Cloud SCM Lead

Akaasa TechnologiesCA, United States
Temporary
Quick Apply

Oracle Cloud SCM Lead Location: Bay Area, CA (Onsite 3 Days/Week, 2 Days WFH San Francisco)- Preferably someone local to the Bay Area who can work in-office 3 days a week in San Francisco.Duration:...Show more

Information Technology Professional

US NavyVisalia, California, United States
Full-time

Job Title: Information Technology Professional (IT/CTN/IS).Category / Component: Enlisted • Both.Information Systems Technicians, Cryptologic Technician Networks, and Intelligence Specialists keep ...Show more

 • Promoted

Paid Surrogacy Opportunity (Gestational Carrier)

The Genesis GroupAuberry, CA, US
Full-time +1

You have the power to make miracles for our families.If you’ve previously had a healthy pregnancy, you may qualify to help intended parents grow their family via IVF and receive $60,000-$100,000+ i...Show more

 • Promoted

Oracle Cloud Administrator Lead

RosendinNone, CA
Full-time
Quick Apply

Whether you're a recent grad or a seasoned professional, you can experience meaningful career growth at Rosendin.Enjoy a true sense of ownership as you work with a proven industry leader on some of...Show more

Head of Operations & Automation – Senior Living

Compass AssociatesCA, United States
Full-time

Head of Operations & Automation – Senior Living.Compass Associates are partnering with a growing senior living organization to hire a Head of Operations & Automation.This is a high-impact leadershi...Show more

 • Promoted

Storage Consultant

MetroSysCA, US
Full-time
Quick Apply

MetroSys is seeking an experienced Storage Consultant with hands-on Cirrus Data expertise to support customer storage modernization and data mobility initiatives.This role will focus on planning, d...Show more

Survey Taker: Earn up to $25 per survey (Remote)

Earn HausVisalia, CA, US
Remote
Full-time +1

Looking for people to participate in taking online surveys for Fortune 500 brands.All you need to do is complete online surveys by sharing your opinion.You will help influence brand decisions on se...Show more

 • Promoted

OpenAM

OpenkyberCA, United States
Full-time
Quick Apply

Role Agentic AI Developer Location: San Leandro, CA The role requires strong knowledge of Agentic AI development, including expertise in Agent-to-Agent (A2A) orchestration and Model Context Protoco...Show more

Director of Product, Healthcare Software Solutions

Intermedia Intelligent CommunicationsRemote, California, US
Remote
Full-time
Quick Apply

Director of Product – Healthcare Solutions.You are a healthcare technology product leader.You’ve spent years building or shaping solutions that operate inside real-world provider workflows—understa...Show more

Women Under 40: Earn $35,000–$40,000 as a Surrogate

Prime GeneticsAuberry, None, US
Full-time

Do you want to become a surrogate? .We are now offering a sign on bonus to all qualified and matched surrogates that sign contracts.The first step is to fill out our surrogate application.A fertili...Show more

 • Promoted

Computer Systems Analyst (Junior)

Dynamic Solutions Technology LLCCA, USA
Full-time
Quick Apply

Dynamic Solutions Technology, LLC.IT and Service needs for commercial and government clients.This position is to provide support in the.Analyze science, engineering, business, and other data proces...Show more