Talent.com
Senior High Performance Computing Cluster Administrator
Senior High Performance Computing Cluster AdministratorNVIDIA • Remote, CA, US
Senior High Performance Computing Cluster Administrator

Senior High Performance Computing Cluster Administrator

NVIDIA • Remote, CA, US
30+ days ago
Job type
  • Full-time
  • Remote
Job description

NVIDIA's Deep Learning Optimized Frameworks Group is looking for a deeply technical HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains. As a member of the DLFW Infrastructure team, you will provide leadership in the design and implementation of groundbreaking GPU compute cluster that runs demanding deep learning, high performance computing, and computationally intensive workloads. We are looking for an expert to identify architectural changes and / or completely innovative approaches for our GPU Compute Cluster. In this role, you will help us with the strategic challenges we encounter, including compute, networking, and storage design for large-scale, high-performance workloads and effective resource utilization in a heterogeneous compute environment.

What you'll be doing :

Administer Linux systems, ranging from powerful DGX servers to embedded systems, bringup hardware to publicly available systems.

Coordinate Storage Solutions and plan for growth.

Automate configuration management, software updates, and maintenance and monitoring of system availability using modern DevOps tools (Ansible, Gitlab, etc.)

Actively connect with management regarding any problems with the equipment and propose resolution.

Plan, build and install / upgrade new systems that support NVIDIA DL Software

What we need to see :

You have a BA, BS, or MS in CS, EE, CE or equivalent experience

4+ years of previous experience deploying and administrating HPC clusters

Familiar with resource scheduling managers (Slurm (preferred), LSF, etc!

Proven track record to script in bash, Perl or python

Experience with containers (Docker, Singularity, LXC)

Deep understanding of operating systems, computer networks, and high-performance applications

Ability to work well with developers & test engineers

Hard-working dedication to provide quality in support for your users

Ways to stand out from the crowd :

Familiarity and prior work experience with technologies such as : Ansible, GIT, Slurm, Zabbix, Prometheus, Grafana and Docker

Familiarity with GPU usage in Compute Cluster and Cuda

Experience with mobile and embedded systems

Basic knowledge of Deep Learning.

Experience coding / scripting in Perl / Python / bash

The base salary range is 148,000 USD - 230,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and . NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Create a job alert for this search

Senior High Performance Computing Cluster Administrator • Remote, CA, US

Similar jobs
IAM Governance

IAM Governance

Openkyber • CA, United States
Full-time
Quick Apply
SRE with JAVA Assignment Duration : 12+ Months Engagement Type : Contract Work Location : Santa Monica, CA - Onsite (Hybrid) Experience : 14 Years Key Responsibilities : Architect globally distributed, ...Show more
Last updated: 14 days ago
Mid-Senior IT Professional (Multiple Opportunities)

Mid-Senior IT Professional (Multiple Opportunities)

Hire Resolve.com • CA, US
Remote
Full-time
Quick Apply
Hire Resolve is assisting IT organizations in hiring experienced IT professionals to support U.This is a multi-role opportunity covering several functions across Information Technology, including (...Show more
Last updated: 27 days ago
CC&B Consultant

CC&B Consultant

enexusglobal • CA, United States
Full-time
Quick Apply
MessageBody"> CC&B Consultant Remote Job Responsibilities <...Show more
Last updated: 5 days ago
Sr. Data Architect

Sr. Data Architect

Business Needs Inc • CA, United States
Full-time
Quick Apply
Description : • 10-15 years minimum IT experience • 8+ years of Data Architect Recent H...Show more
Last updated: 4 days ago
Computer Systems Analyst (Journeyman)

Computer Systems Analyst (Journeyman)

Dynamic Solutions Technology LLC • CA, USA
Full-time
Quick Apply
Dynamic Solutions Technology, LLC.IT and Service needs for commercial and government clients.This position is to provide support in the. Analyze science, engineering, business, and other data proces...Show more
Last updated: 30+ days ago
Lead DevOps Engineer (Cloudflare Specialist)

Lead DevOps Engineer (Cloudflare Specialist)

smart techlink • CA, United States
Full-time
Quick Apply
Hi I just reach you regarding the job opportunity Lead DevOps Engineer (Cloudflare Specialist).Let me know if you are interested in the position, Please share your up...Show more
Last updated: 3 days ago
Solutions Architect - High-reliability FPGA and / or Software

Solutions Architect - High-reliability FPGA and / or Software

Fidus Systems • CA, US
Remote
Full-time
Quick Apply
Fidus is a global high-tech design firm headquartered in Ottawa, with additional design centres in Kitchener-Waterloo and San Jose. We specialize in leading-edge electronic product develop...Show more
Last updated: 6 days ago
AWS Solution Architect

AWS Solution Architect

Qode • California, CA, US
Full-time
Quick Apply
Designing Solution architecture, and work on Data Ingestion, Preparation, and Transformation.Debugging the production failures and identifying the solution. Developing efficient frameworks for devel...Show more
Last updated: 18 days ago
Oracle Cloud Support Administrator

Oracle Cloud Support Administrator

United Software Group Inc • CA, United States
Temporary
Quick Apply
Job Title : Oracle Cloud Support Administrator Duration : 6 Months Contract (Possible Extension) Location : Remote USA <...Show more
Last updated: 7 days ago
Solutions Architect – Platform Services (Contract to Hire)

Solutions Architect – Platform Services (Contract to Hire)

E-Solutions INC • CA, United States
Permanent
Quick Apply
Hi Professionals, Title : Solutions Architect Platform Services (Contract to Hire) Experience : 18+ yea...Show more
Last updated: 3 days ago
Full Stack Engineer (Senior)

Full Stack Engineer (Senior)

Business Needs Inc • CA, United States
Full-time
Quick Apply
Full Stack Engineer (Senior) Compensation : $150k $175k 0.Location : Remote New York City Experience : 5+ years Employment Type : Full Time About the Job About Kouper Health Kouper Health i...Show more
Last updated: 5 days ago
Oracle Finance Cloud with Costing Cloud

Oracle Finance Cloud with Costing Cloud

Conquest Tech Solutions Inc • CA, United States
Full-time +2
Quick Apply
Role : Oracle Finance Cloud with Costing Cloud Experience Location : Preferably based on the Bay area CA who can go Onsite for all 5 days.Duration : 6+ Months< / ...Show more
Last updated: 4 days ago
Storage Consultant

Storage Consultant

MetroSys • CA, US
Full-time
Quick Apply
MetroSys is seeking an experienced Storage Consultant with hands-on Cirrus Data expertise to support customer storage modernization and data mobility initiatives. This role will focus on planning, d...Show more
Last updated: 30+ days ago
Sr. Principal Consultant (Senior Data Engineer, Compliance Engineering & Technology)

Sr. Principal Consultant (Senior Data Engineer, Compliance Engineering & Technology)

Apptad Inc • CA, United States
Full-time
Quick Apply
Principal Consultant (Senior Data Engineer, Compliance Engineering & Technology) 3 resources Subcontracting roles<...Show more
Last updated: 5 days ago
Oracle Fusion Lead

Oracle Fusion Lead

Business Needs Inc • CA, United States
Full-time
Quick Apply
Mandatory Skill set - Oracle Apps techno functional with Manufacturing and WMS domain experience Detailed Job Description 1.Functional / Technical Su...Show more
Last updated: 1 hour ago • New!
Data Center Procurement Killer!

Data Center Procurement Killer!

RM Staffing B.V. • California, CA, US
Full-time
Reboot Monkey is a leading provider of comprehensive data center management solutions, offering services such as managed colocation, smart hands, and rack and stack solutions.We ensure fast deploym...Show more
Last updated: 30+ days ago
Computer Systems Analyst (Junior)

Computer Systems Analyst (Junior)

Dynamic Solutions Technology LLC • CA, USA
Full-time
Quick Apply
Dynamic Solutions Technology, LLC.IT and Service needs for commercial and government clients.This position is to provide support in the. Analyze science, engineering, business, and other data proces...Show more
Last updated: 30+ days ago
IAM Director

IAM Director

Openkyber • CA, United States
Temporary
Quick Apply
Job Role : Senior SDET (Java, Selenium) Location : Downey, CA (90242) Duration : 12 Months Contract Additional Skills R...Show more
Last updated: 6 days ago