Talent.com
No longer accepting applications
Slurm Administration & Systems Architecture

Slurm Administration & Systems Architecture

MidjourneySonoma, CA, US
30+ days ago
Job type
  • Full-time
Job description

Job Description

Overview

We are seeking a highly skilled HPC / AI / ML Cluster Engineer to support the design, deployment, and ongoing operations of large-scale HPC environments powered by Slurm. This role centers on cluster engineering, administration, and performance optimization, with emphasis on GPU-accelerated computing, advanced networking, and workload scheduling. In this role, you will work closely with our researchers, vendors, and partners to manage Slurm clusters that are used for AI / ML workloads.

Responsibilities

Cluster Engineering & Deployment

  • Participate in the design and bring-up of bare metal HPC / AI / ML environments
  • Architect compute node definitions (NUMA, GRES GPU topologies, CPU pinning) and Slurm partitioning strategies for diverse workloads.
  • Integrate heterogeneous hardware platforms into cohesive scheduling environments.
  • Develop provisioning and imaging workflows (Ansible, MAAS, cloud-init, CI / CD pipelines) for reproducible cluster build-out.
  • Coordinate communications between vendors, researchers, and other partners during cluster bring-up and operation.

Slurm Management

  • Configure and operate the Slurm Workload Manager.
  • Build custom Slurm plugins and scripts (epilog / prolog, pam_slurm_adopt) to extend functionality and integrate with authentication, and monitoring.
  • Manage federated Slurm setups across multi-site or hybrid cloud environments.
  • System Administration & Monitoring

  • Administer Linux HPC environments, including network configuration, storage integration, and kernel tuning for HPC workloads.
  • Deploy and maintain observability stacks for system health, GPU metrics, and job monitoring.
  • Automate failure detection, node health checks, and job cleanup to ensure high uptime and reliability.
  • Manage security and access control (LDAP / SSSD, VPN, PAM, SSH session auditing).
  • User & Stakeholder Support

  • Assist cluster users with developing workflows that make efficient use of compute resources.
  • Containerize HPC applications with Docker / Podman / Enroot -Pyxis and integrate GPU-aware runtimes into Slurm jobs.
  • Automate cost accounting and cluster usage reporting.
  • Qualifications

  • 7+ years experience in HPC cluster administration and engineering, with deep knowledge of Slurm.
  • Familiarity with common AI / ML software package dependencies and workflows
  • Expert in Slurm configuration, partition design, QoS / preemption policies, and GRES GPU scheduling.
  • Strong background in Linux system administration, networking, and performance tuning for HPC environments.
  • Hands-on experience with parallel file system, advanced networking (InfiniBand, RoCE, 100 / 200 GbE), and monitoring stacks.
  • Proficient with automation tools (Ansible, Terraform, CI / CD pipelines) and version control.
  • Demonstrated ability to operate GPU-accelerated clusters at scale.
  • Create a job alert for this search

    Administration • Sonoma, CA, US

    Related jobs
    • Promoted
    Senior Director, Data and AI Architecture Leader

    Senior Director, Data and AI Architecture Leader

    Dynavax TechnologiesEmeryville, CA, United States
    Full-time
    This position can be 100% remote, but must be located in the United States.Dynavax is a commercial-stage biopharmaceutical company developing and commercializing novel vaccines to help protect the ...Show moreLast updated: 30+ days ago
    • Promoted
    JDE Solutions Architect

    JDE Solutions Architect

    Tri-S Recruiters, IncSanta Rosa, CA, US
    Full-time +1
    Quick Apply
    I am looking for a Seasoned JDEdwards EnterpriseOne Solutions Architect.This is a direct hire position and could possibly be a remote position. You must be based in the United States prefer West Coa...Show moreLast updated: 18 days ago
    • Promoted
    Salesforce FSL Consultant

    Salesforce FSL Consultant

    TECHOHANASonoma, CA, US
    Temporary
    Salesforce Field Service Lightning (FSL) Consultant – Hybrid (California).Duration : 6-Month Contract-to-Hire.Location : San Francisco, CA - Hybrid - (3 days on-site, 2 days remote).Salesforce ...Show moreLast updated: 2 days ago
    • Promoted
    United States Customs and Border Protection Officer

    United States Customs and Border Protection Officer

    U.S. Customs and Border ProtectionEsparto, California, US
    Full-time +1
    Customs and Border Protection (CBP) offers those interested in a career in law enforcement an exceptional opportunity to work with an elite team of highly trained professionals whose camaraderie, p...Show moreLast updated: 30+ days ago
    • Promoted
    Remote Side Hustle Developer

    Remote Side Hustle Developer

    Finance BuzzHidden Valley Lake, California, US
    Remote
    Full-time +1
    This position is for individuals who want to develop a side income stream while still working full time.You will test different small-scale remote opportunities, learn what works, and grow what pro...Show moreLast updated: 26 days ago
    • Promoted
    • New!
    Lecturer Pool - Design Studio Classes- Department of Architecture

    Lecturer Pool - Design Studio Classes- Department of Architecture

    InsideHigherEdBerkeley, California, United States
    Permanent
    Lecturer Pool - Design Studio Classes- Department of Architecture.The posted UC academic salary scales set the minimum pay at appointment. See the following table for the salary scale for this posit...Show moreLast updated: 19 hours ago
    • Promoted
    Staff Systems Engineer

    Staff Systems Engineer

    Bio-Rad LaboratoriesHercules, CA, United States
    Full-time
    Working within Bio-Rad's Life Science R&D Group as a Systems Engineer, you will take engineering concepts, requirements and transform them into functional prototypes and finished products that impr...Show moreLast updated: 9 days ago
    • Promoted
    Structural Engineer

    Structural Engineer

    T&S StructuralSanta Rosa, CA, US
    Full-time
    Structural Engineer (Levels 2–6).Employment Type : Full-time | On-site.T&S Structural provides comprehensive structural engineering services from conceptual planning through construction, ...Show moreLast updated: 16 days ago
    • Promoted
    Customs and Border Protection Officer

    Customs and Border Protection Officer

    U.S. Customs and Border ProtectionMadison, California, US
    Full-time +1
    Customs and Border Protection (CBP) offers those interested in a career in law enforcement an exceptional opportunity to work with an elite team of highly trained professionals whose camaraderie, p...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Lecturer Pool - Architectural Humanities & Sciences - Department of Architecture

    Lecturer Pool - Architectural Humanities & Sciences - Department of Architecture

    InsideHigherEdBerkeley, California, United States
    Permanent
    Lecturer Pool - Architectural Humanities & Sciences - Department of Architecture.The posted UC academic salary scales set the minimum pay at appointment. See the following table for the salary scale...Show moreLast updated: 19 hours ago
    • Promoted
    ServiceNow ITSM Solutions Architect

    ServiceNow ITSM Solutions Architect

    Global Applications SolutionSonoma, CA, US
    Full-time
    ServiceNow ITSM Solutions Architect.ServiceNow ITSM Solutions Architect.We are seeking a highly skilled ServiceNow ITSM Solutions Architect to maximize the value of our existing Moveworks conversat...Show moreLast updated: 1 day ago
    • Promoted
    Shift Supervisor

    Shift Supervisor

    Under-StudySt. Helena, CA, US
    Full-time
    Join our team for the launch of Under-Study, an innovative marketplace concept located next to PRESS.Spearheaded by Chef Philip Tessier and Director of Operations Justin Williams, Under-Study promi...Show moreLast updated: 30+ days ago
    • Promoted
    Venue Director I - Audio Visual, Event Technology

    Venue Director I - Audio Visual, Event Technology

    Pinnacle LiveCalistoga, CA, United States
    Full-time
    Pinnacle Live is a premium, in-house AV partner.We elevate live event expectations for people and venues who demand better. With an expert balance of big-picture problem-solving and boots-on-the-gro...Show moreLast updated: 3 days ago
    • Promoted
    Restaurant Manager

    Restaurant Manager

    The Charter OakSaint Helena, CA, US
    Full-time
    Train, coach, and manage service staff.Develop and implement updated SOPs for FOH team.Coordinate daily restaurant service operations and activities. Assist in coordination and execute on and offsit...Show moreLast updated: 30+ days ago
    • Promoted
    Lecturer Pool - Design Studio Classes- Department of Architecture

    Lecturer Pool - Design Studio Classes- Department of Architecture

    University of California-BerkeleyBerkeley, CA, United States
    Permanent
    The posted UC academic salary scales set the minimum pay at appointment.See the following table for the salary scale for this position : https : / / www. A reasonable estimate for a 100% time Lecturer po...Show moreLast updated: 30+ days ago
    • Promoted
    Travel Speech Language Pathologist (SLP) - $3,710 to $3,857 per week in Winters, CA

    Travel Speech Language Pathologist (SLP) - $3,710 to $3,857 per week in Winters, CA

    AlliedTravelCareersWinters, CA, US
    Full-time +1
    AlliedTravelCareers is working with Aya Healthcare to find a qualified Speech Language Pathologist (SLP) in Winters, California, 95694!. Aya Education has an immediate opening for the following posi...Show moreLast updated: 30+ days ago
    • Promoted
    Slurm Administration & Systems Architecture

    Slurm Administration & Systems Architecture

    MidjourneySanta Rosa, CA, US
    Full-time
    We are seeking a highly skilled HPC / AI / ML Cluster Engineer to support the design, deployment, and ongoing operations of large-scale HPC environments powered by Slurm. This role centers on cluster en...Show moreLast updated: 30+ days ago
    • Promoted
    Travel Speech Language Pathologist (SLP) - $1,981 to $2,163 per week in Winters, CA

    Travel Speech Language Pathologist (SLP) - $1,981 to $2,163 per week in Winters, CA

    AlliedTravelCareersWinters, CA, US
    Full-time +1
    AlliedTravelCareers is working with Aya Healthcare to find a qualified Speech Language Pathologist (SLP) in Winters, California, 95694!. Aya Education has an immediate opening for the following posi...Show moreLast updated: 12 days ago