Talent.com
Site Reliability Engineer - Storage
Site Reliability Engineer - StoragexAI • San Francisco, CA, United States
Site Reliability Engineer - Storage

Site Reliability Engineer - Storage

xAI • San Francisco, CA, United States
30+ days ago
Job type
  • Full-time
Job description

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers and researchers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the role

As a Site Reliability Storage Engineer, you will play a pivotal role in designing, building, and operating exascale storage systems to manage our cutting-edge AI research data with unparalleled scalability and reliability across multiple regions. This role's core responsibility is to make sure our heterogenous storage systems in on-prem + cloud are reliable and performant.

We’re seeking engineers with expertise in exascale data management systems or distributed filesystems to join our mission-driven team.

What you’ll do

  • Develop and optimize software to manage exascale data, enabling efficient and reliable access for xAI researchers working on advanced AI models.
  • Enhance the reliability, performance, and cost-effectiveness of xAI’s storage infrastructure to support large-scale AI research workloads.
  • Collaborate closely with researchers to understand their data use cases and tailor storage solutions to meet their needs.
  • Implement robust security measures to safeguard critical datasets, ensuring data integrity and confidentiality.

Ideal Experience

You’d be an exceptional candidate if you possess some (or all) of the following :

  • Writing scalable, high-performance code in Rust or Go for storage-related applications or tooling.
  • Managing storage infrastructure with IaC tools like Pulumi, Terraform, or Ansible.
  • Past experience working with storage vendors facilitating partnership alignment, and integrating their tooling within xAI’s Infrastructure.
  • Familiarity with Kubernetes storage primitives (e.g., Persistent Volumes, CSI drivers) and integrating storage with containerized workloads.
  • Bonus : Experience with AI / ML data pipelines, including handling large datasets for training and inference.
  • Rust and Go
  • Interview Process

    After submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to a 45 minute interview (“phone interview”) during which a member of our team will ask some basic questions. If you clear the initial phone interview, you will enter the main process, which consists of four technical interviews :

  • Coding assessment in Python, Golang, or Rust
  • Systems hands-on : Demonstrate practical skills in a live problem-solving session.
  • Coding assessment or system design discussion based on the candidate's background.
  • Project deep-dive : Present your past exceptional work to a small audience.
  • Every application is reviewed by a member of our technical team. All interviews will be conducted via Google Meet.

    We do not condone usage of AI in interviews and have tools to detect AI usage.

    Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

    Accepted file types : pdf, doc, docx, txt, rtf

    If available, please share any examples of your writings, presentations, orpublications.

    We welcome a variety of formats, such as X posts discussing

    technical issues, participation in public email threads, bug reports, long-form

    blog posts, or more formal publications like papers and conference talks.

    Submission is optional but highly encouraged.

    Current company

    If you are currently employed in the field, please tell us the name of your employer.

    If you are currently employed in the field, please tell us your role including your seniority level (e.g. Software Engineer II).

    Profile

    If you have a public profile, please provide its URL.

    X Profile

    If you have a public X profile, please provide its URL.

    If you have a Google Scholar page, please provide its URL.

    What exceptional work have you done?

    In 100 words or less, tell us about a piece of work you are most proud of.

    Will you now, or in the future, require sponsorship for employment visa status (e.g., H-1B visa) to legally work for X.AI LLC in the U.S.?

  • Select...
  • #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • San Francisco, CA, United States

    Related jobs
    Senior Site Reliability Engineer – Platform

    Senior Site Reliability Engineer – Platform

    Icon Ventures • San Francisco, CA, United States
    Full-time
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.We blend cognitive science with machine learning to personalize and enhance the lear...Show more
    Last updated: 8 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Latent • San Francisco, CA, United States
    Full-time
    Location : San Francisco, CA (5 Days In-Office).You are the infrastructure expert who enables our rapid product development and guarantees. AI platform for major health systems.Your focus on operatio...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer, Storage

    Senior Site Reliability Engineer, Storage

    Epoch Biodesign • San Francisco, CA, United States
    Full-time
    Crusoe Energy is on a mission to unlock value in stranded energy resources through the power of computation.Take a look at what we do! - https : / / www. We aim to align the long term interests of the c...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Conductorone • San Francisco, California, United States
    Full-time
    ConductorOne is the modern identity governance platform that makes it possible to move beyond the limitations of legacy IGA and reduce the identity attack surface with confidence.Designed for flexi...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer - Inference

    Site Reliability Engineer - Inference

    Lambda • San Francisco, California, United States
    Full-time
    In 2012, Lambda started with a crew of AI engineers publishing research at top machine-learning conferences.We began as an AI company built by AI engineers. Today, we're on a mission to be the world...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Zoox • Foster City, California, United States
    Full-time
    Zoox is looking for a platform / site reliability engineer who will be responsible for measuring and maintaining the uptime of the many services critical to the development process for autonomous veh...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Workos • San Francisco, California, United States
    Remote
    Full-time
    WorkOS builds tools and services for developers to help them implement authentication, identity, authorization, and overall enterprise readiness. We’re a fully distributed team with employees across...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Alembic • San Francisco, CA, United States
    Full-time
    We’re looking for an experienced.Site Reliability Engineer (SRE).You’ll partner with engineers and data scientists to build, automate, and maintain the infrastructure that powers our core platform—...Show more
    Last updated: 9 days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Loft Orbital Solutions • San Francisco, California, United States
    Full-time
    Loft Orbital builds a space infrastructure providing a fast & simple path to orbit.We operate satellites, fly customer payloads onboard and handle the entire mission from initial concept through in...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood Materials • San Francisco, California, United States
    Full-time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling .Responsibilities will include : . Collect business & technical requirements and work wit...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Replit • Foster City, California, United States
    Full-time
    Replit is the fastest way to turn ideas into software.With our powerful AI-powered Agent and Assistant, anyone can create and launch apps from natural language in just one click.Build and deploy fu...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Checkr • San Francisco, California, United States
    Full-time
    Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...Show more
    Last updated: 30+ days ago • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Visa • Foster City, California, United States
    Full-time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Speak • San Francisco, CA, United States
    Full-time
    Our mission is to reinvent the way people learn, starting with language.Learning a language can change a life by opening doors to new cultures, careers, and communities. Two billion people around th...Show more
    Last updated: 8 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    P2P • San Francisco, CA, United States
    Full-time
    Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers th...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    DevOps projects • Berkeley, CA, United States
    Full-time
    LMArena is an engineering-first startup redefining how the world evaluates large language models.Created in 2023 by UC Berkeley researchers, our neutral, community-driven benchmarking platform attr...Show more
    Last updated: 7 days ago • Promoted
    Senior / Lead Site Reliability Engineer Federal

    Senior / Lead Site Reliability Engineer Federal

    C3 Ai • Redwood City, California, United States
    Full-time
    C3 AI (NYSE : AI), is the Enterprise AI application software company.C3 AI delivers a family of fully integrated products including the C3 Agentic AI Platform, an end-to-end platform for developing,...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer, Storage

    Senior Site Reliability Engineer, Storage

    Crusoe • San Francisco, CA, United States
    Full-time
    Senior Site Reliability Engineer, Storage.Senior Site Reliability Engineer, Storage.Senior Site Reliability Engineer, Storage. Senior Site Reliability Engineer, Storage.Crusoe is building the World’...Show more
    Last updated: 30+ days ago • Promoted