Talent.com
Staff Software Engineer, Slurm

Staff Software Engineer, Slurm

Crusoe Energy Systems LLCSan Francisco, CA, United States
30+ days ago
Job type
  • Full-time
Job description

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability.

Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.

About the Role :

We are actively seeking an exceptional Staff Software Engineer to join our cloud software team, focusing specifically on building and operating Slurm as a fully managed cloud service within Crusoe Cloud. This role is crucial for delivering next-generation orchestration capabilities to power GPU-accelerated and high-performance computing (HPC) at scale.

Your expertise will be instrumental in designing and scaling our carbon-reducing operating model, and advancing our AI training clusters to lead the industry in reliability and performance. You will shape the technical direction of systems that allow customers to run advanced workloads across CPUs, NVIDIA and AMD GPUs, and high-performance networking environments.

You will be involved in writing and reviewing code, contributing to proposals, and drafting architecture documents. You will evaluate tools and frameworks, considering their impact on reliability, scalability, operational costs, and ease of adoption.

What You'll Be Working On :

Lead the development and engineering of our managed Slurm offering, providing a seamless experience for AI / ML and HPC customers who rely on robust Slurm job scheduling.

Contribute to the development of scalable and robust software solutions, closely aligning with the strategic objectives outlined in the Crusoe Cloud roadmap.

Design, build, and maintain Kubernetes operators and controllers dedicated to managing the lifecycle, configuration, and state of large-scale Slurm clusters.

Drive the integration of GPU acceleration in the Slurm environment, including device plugin architecture, GPU operators, accelerator-aware scheduling, and resource allocation.

Ensure that high-performance networking technologies, such as InfiniBand and RoCE, are correctly leveraged for distributed GPU workloads running through Slurm.

Implement and manage features such as multi-tenancy, cluster lifecycle management, auto-scaling, and high availability for the managed Slurm control plane services.

Develop scalable systems to compete with leading managed services.

Support the development of your peers by sharing knowledge and providing guidance in technical discussions.

What You'll Bring to the Team :

You have 7+ years of experience working in software engineering, with strong experience in Systems Engineering. Experience in distributed systems, cloud, or HPC environments is a must

You possess 2+ years of programming experience in GoLang . Strong proficiency in other systems languages (Rust, C++, Python for HPC tooling) is also beneficial.

You have extensive experience with Kubernetes and Linux Engineering and debugging .

You possess deep knowledge of Slurm (Simple Linux Utility for Resource Management) administration and the architecture required for managing compute jobs in high-performance environments.

You are skilled in infrastructure as code and familiar with systems-level challenges, ideally with experience utilizing Terraform .

You understand Argo, CI / CD, and Automated Testing pipelines . You can design system architecture, taking ownership of system architecture, including CI / CD pipelines, while ensuring adherence to security standards.

Strong knowledge of container networking (CNI plugins, service meshes) and Linux networking fundamentals.

Familiarity with GPU integration in Kubernetes, including device plugins and GPU operators.

You have excellent communication skills, both verbal and written.

Compensation Range

Compensation will be paid in the range of $185,000 - $224,000. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data.

Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex / gender, sexual preference / orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

#J-18808-Ljbffr

Create a job alert for this search

Staff Software Engineer • San Francisco, CA, United States

Related jobs
  • Promoted
Staff Software Engineer

Staff Software Engineer

GEICOSan Francisco, CA, United States
Full-time
Position Summary GEICO is seeking an experienced software engineer with a passion for building high-performance, low maintenance, zero-downtime platforms, and applications.You will help drive our i...Show moreLast updated: 30+ days ago
  • Promoted
Staff Software Engineer

Staff Software Engineer

Wispr AI, Inc.San Francisco, CA, United States
Full-time
Wispr Flow is making it as effortless to interact with your devices as talking to a close friend.Voice is the most natural, powerful way to communicate — and we’re building the interfaces to make t...Show moreLast updated: 30+ days ago
  • Promoted
Staff Software Engineer

Staff Software Engineer

ClericSan Francisco, CA, United States
Full-time
We're building an autonomous AI SRE that helps software engineering teams reliably investigate production incidents.Our agent combines LLMs with tools to understand systems, reason through problems...Show moreLast updated: 30+ days ago
  • Promoted
Staff Software Engineer - Platform

Staff Software Engineer - Platform

SamsaraSan Francisco, CA, United States
Full-time
Samsara (NYSE : IOT) is the pioneer of the Connected Operations Cloud, a platform that enables organizations that depend on physical operations to harness Internet of Things (IoT) data to develop ac...Show moreLast updated: 6 days ago
  • Promoted
Staff Software Engineer, Control

Staff Software Engineer, Control

PsiQuantumPalo Alto, CA, United States
Full-time
Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
  • Promoted
Staff Software Engineer

Staff Software Engineer

GoFundMeSan Francisco, CA, United States
Full-time
Want to help us help others? We’re hiring!.GoFundMe is the world’s most powerful community for good, dedicated to helping people help each other. By uniting individuals and nonprofits in one place, ...Show moreLast updated: 30+ days ago
  • Promoted
Staff Software Engineer - Custom Solutions

Staff Software Engineer - Custom Solutions

The Trade DeskSan Francisco, CA, United States
Full-time
At The Trade Desk, we design and build custom solutions that extend our platform's capabilities, with a focus on quality, reusability, and customer success. From client facing applications and backe...Show moreLast updated: 30+ days ago
  • Promoted
  • New!
Staff Software Engineer - Fullstack

Staff Software Engineer - Fullstack

ExecutivePlacements.comSan Francisco, CA, United States
Full-time
We are growing the engineering team and looking for engineers excited to solve challenging problems, deliver impactful capabilities throughout our stack, and build world‑class technology.You will b...Show moreLast updated: 4 hours ago
  • Promoted
Staff Software Engineer

Staff Software Engineer

BraveclojureSan Francisco, CA, United States
Full-time
Our mission at Onton is to help people make decisions they love, instantly.We’re tackling the most economically impactful decisions first : the average shopping journey takes 79 days, and we’re taki...Show moreLast updated: 30+ days ago
  • Promoted
Staff Software Engineer, Core

Staff Software Engineer, Core

DescriptSan Francisco, CA, United States
Full-time
We are building the next-generation AI-powered platform and web application for easy and fast creation of audio and video content. Growing this revolutionary product involves unique technical challe...Show moreLast updated: 6 days ago
  • Promoted
Staff Software Engineer

Staff Software Engineer

PsiQuantumPalo Alto, CA, United States
Full-time
Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
  • Promoted
Staff Software Engineer

Staff Software Engineer

SamsaraSan Francisco, CA, United States
Full-time
Samsara (NYSE : IOT) is the pioneer of the Connected Operations™ Cloud, which is a platform that enables organizations that depend on physical operations to harness Internet of Things (IoT) data to ...Show moreLast updated: 17 days ago
  • Promoted
Staff Software Engineer, API

Staff Software Engineer, API

HeadspaceSan Francisco, CA, United States
Full-time
About the Staff Software Engineer, API at Headspace : .Headspace is seeking a talented Staff Software Engineer, API to join its B2B Team. The B2B team is responsible for building and maintaining the e...Show moreLast updated: 30+ days ago
  • Promoted
Staff Software Platform EngineerSoftware Engineering • Berkeley, CA; Somerville, MA; Weirton, WV • Full time • On-site

Staff Software Platform EngineerSoftware Engineering • Berkeley, CA; Somerville, MA; Weirton, WV • Full time • On-site

Form EnergyBerkeley, CA, United States
Full-time
Are you ready to build America's energy future? Form Energy is an American manufacturing and energy technology company.We're revolutionizing energy storage with cost-effective, multi-day technology...Show moreLast updated: 30+ days ago
  • Promoted
Staff Software Engineer

Staff Software Engineer

Altana AISan Francisco, CA, United States
Full-time
AI can be a powerful tool for good in the world – at Altana we apply AI to the world’s largest organized body of supply chain data to power a more resilient, more secure, and more sustainable model...Show moreLast updated: 4 days ago
  • Promoted
Staff Software Engineer

Staff Software Engineer

Bio-Rad LaboratoriesHercules, CA, United States
Full-time
This role is both technical and collaborative.You will work closely with cross-functional teams including systems engineers, mechanical designers, assay development scientists, and quality engineer...Show moreLast updated: 30+ days ago
  • Promoted
Staff Software Engineer, Slurm

Staff Software Engineer, Slurm

CrusoeSan Francisco, CA, United States
Full-time
Crusoe's mission is to accelerate the abundance of energy and intelligence.We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, spe...Show moreLast updated: 5 days ago
  • Promoted
Staff Software Engineer, Enterprise

Staff Software Engineer, Enterprise

F2San Francisco, CA, United States
Full-time
Staff Software Enterprise Architect.You will collaborate closely with the founding team to architect, develop, and scale F2's AI-driven enterprise platform. Architect and implement foundational infr...Show moreLast updated: 30+ days ago