Talent.com
Senior HPC Cluster Engineer
Senior HPC Cluster EngineerVirtualVocations • Alexandria, Virginia, United States
Senior HPC Cluster Engineer

Senior HPC Cluster Engineer

VirtualVocations • Alexandria, Virginia, United States
13 hours ago
Job type
  • Full-time
Job description

A company is looking for a Senior AI and ML HPC Cluster Engineer.

Key Responsibilities

Provide leadership and strategic guidance on managing large-scale HPC systems, including deployment of compute, networking, and storage

Develop and enhance the ecosystem around GPU-accelerated computing, including scalable automation solutions

Build and maintain AI and ML heterogeneous clusters both on-premises and in the cloud

Required Qualifications

Bachelor's degree in Computer Science, Electrical Engineering, or related field, or equivalent experience

Minimum 5+ years of experience designing and operating large-scale compute infrastructure

Experience with AI / HPC advanced job schedulers, such as Slurm, K8s, PBS, RTDA, or LSF

Proficient in administering Centos / RHEL and / or Ubuntu Linux distributions

Solid understanding of cluster configuration management tools such as Ansible, Puppet, or Salt

Create a job alert for this search

Senior Hpc Engineer • Alexandria, Virginia, United States