Talent.com
High-Performance Networking Engineer - SupercomputingSan Francisco & Palo Alto, CA

High-Performance Networking Engineer - SupercomputingSan Francisco & Palo Alto, CA

XaiSan Francisco, CA, United States
2 days ago
Job type
  • Full-time
Job description

High-Performance Networking Engineer - Supercomputing

High-Performance Networking Engineer on xAI's Supercomputing team, you will design and optimize low-latency, high-bandwidth networking solutions using NVIDIA's RDMA-capable technologies to support some of the world's largest GPU supercomputing clusters. These clusters drive AI training and inference workloads, demanding cutting-edge performance and scalability.

Focus

  • Develop and tune RDMA-based communication systems leveraging NVIDIA GPUs and Mellanox NICs (InfiniBand, RoCE) for ultra-fast data transfer between nodes.
  • Implement and optimize GPUDirect RDMA to enable direct memory access between GPUs and network interfaces, minimizing CPU overhead.
  • Integrate RDMA solutions with Kubernetes-based workloads, ensuring seamless operation across distributed compute and storage systems.
  • Collaborate with AI researchers and infrastructure teams to accelerate data pipelines and collective communications using NCCL and MPI.
  • Troubleshoot and resolve performance bottlenecks in high-throughput, low-latency networking environments.

Ideal Experience

  • Hands-on experience with NVIDIA RDMA technologies (e.g., GPUDirect RDMA, RoCE, InfiniBand) in HPC or AI supercomputing environments.
  • Proficiency in programming with Rust, C, or C++ for low-level networking and system optimization.
  • Familiarity with NVIDIA's networking stack, including Mellanox drivers, libraries (e.g., libibverbs), and tools (e.g., NVPeerMemory).
  • Experience optimizing distributed systems with MPI, NCCL, or similar frameworks for GPU-accelerated workloads.
  • Knowledge of Kubernetes networking and integrating RDMA into containerized environments.
  • Bonus : Background in AI / ML training workflows and their networking demands (e.g., large-scale parameter synchronization).
  • Tech Stack

  • NVIDIA GPUs and Mellanox networking (InfiniBand, RoCE)
  • RDMA protocols (e.g., GPUDirect RDMA, RoCEv2)
  • Kubernetes
  • Rust and C / C++
  • MPI (Message Passing Interface) and NCCL (NVIDIA Collective Communications Library)
  • Annual Salary Range

    $180,000 - $440,000 USD

    xAI is an equal opportunity employer and does not unlawfully discriminate based on race, color, religion, ethnicity, ancestry, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, age, disability, medical conditions, genetic information, marital status, military or veteran status, or any other applicable legally protected characteristics. Qualified applicants with arrest or conviction records will be considered for employment in accordance with all applicable federal, state, and local laws, including the San Francisco Fair Chance Ordinance, Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act.

    For Los Angeles County (unincorporated) Candidates : xAI reasonably believes that criminal history may have a direct, adverse and negative relationship on the following job duties, potentially resulting in the withdrawal of a conditional offer of employment : Access to information technology systems and confidential information, including proprietary and trade secret information, and / or user data; Interacting with internal and / or external clients and colleagues; and Exercising sound judgment.

    Create a job alert for this search

    Engineer Networking • San Francisco, CA, United States

    Related jobs
    • Promoted
    Network Engineer

    Network Engineer

    QualysFoster City, CA, United States
    Permanent
    Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!.The successful applicant will be performing work in FedRAMP environments, and th...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Lead Network Engineer II in San Francisco

    Senior Lead Network Engineer II in San Francisco

    Energy Jobline ZRSan Francisco, CA, United States
    Full-time
    Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub.We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy ...Show moreLast updated: 2 days ago
    • Promoted
    Principal Network Software and Solution Engineer - Switch Solutions (27649)

    Principal Network Software and Solution Engineer - Switch Solutions (27649)

    SupermicroSan Jose, CA, United States
    Full-time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show moreLast updated: 13 days ago
    • Promoted
    Senior Network Engineering Consultant

    Senior Network Engineering Consultant

    Tubman Technologies Incsan jose, CA, US
    Full-time
    Senior network engineer (contract position) onsite Location : San Jose, CA Dur : 6+ months The candidate should have the following skill sets with 10+years of exp Strong in routing & switching.Either...Show moreLast updated: 22 days ago
    • Promoted
    Networking Engineer for Robotics - San Francisco

    Networking Engineer for Robotics - San Francisco

    Polymath RoboticsSan Francisco, CA, United States
    Full-time
    Networking Engineer for Robotics - San Francisco.Polymath Robotics is creating software that can safely control all of the worlds industrial vehicles, and were looking for a Network Engineer / DevOps...Show moreLast updated: 2 days ago
    • Promoted
    Network System Engineer (27526)

    Network System Engineer (27526)

    SupermicroSan Jose, CA, United States
    Full-time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show moreLast updated: 13 days ago
    • Promoted
    AI & High Performance Computing (HPC) Network Senior Engineer

    AI & High Performance Computing (HPC) Network Senior Engineer

    AccentureWalnut Creek, CA, United States
    Full-time
    We are looking for a Network Engineer to design, deploy, and troubleshoot high-throughput, low-latency networks that support large-scale AI training and inference workloads.In this role, you'll wor...Show moreLast updated: 2 days ago
    • Promoted
    IT Network Engineer

    IT Network Engineer

    SupermicroSan Jose, CA, United States
    Full-time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show moreLast updated: 30+ days ago
    • Promoted
    Sr Platform Network Engineer with Strong GCP experience

    Sr Platform Network Engineer with Strong GCP experience

    SparktekSan Francisco, CA, United States
    Full-time
    Mountain View, CA [Needs to be onsite for 1 week once in a quarter on your own expenses].We are looking for a Senior GCP Platform Engineer with expertise in networking to join our dynamic team.In t...Show moreLast updated: 2 days ago
    • Promoted
    Principal Network Engineer

    Principal Network Engineer

    San Francisco Compute Co.San Francisco, CA, United States
    Full-time
    We think people should buy it like one.Startups shouldn’t be forced to buy a year’s worth of compute time in order to get market rate and compute providers shouldn’t go bankrupt because they can’t ...Show moreLast updated: 30+ days ago
    • Promoted
    Network System Engineer (27529)

    Network System Engineer (27529)

    SupermicroSan Jose, CA, United States
    Full-time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show moreLast updated: 13 days ago
    • Promoted
    Network System Engineer (27527)

    Network System Engineer (27527)

    SupermicroSan Jose, CA, United States
    Full-time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show moreLast updated: 13 days ago
    • Promoted
    Senior Network Engineer in Emeryville

    Senior Network Engineer in Emeryville

    Energy Jobline ZREmeryville, CA, United States
    Full-time
    Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub.We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy ...Show moreLast updated: 2 days ago
    • Promoted
    Network System Engineer (27528)

    Network System Engineer (27528)

    SupermicroSan Jose, CA, United States
    Full-time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show moreLast updated: 13 days ago
    • Promoted
    Senior Network Engineer

    Senior Network Engineer

    QualysFoster City, CA, United States
    Full-time
    Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!.As a Sr Network Engineer, you will play a key role in developing and enhancing t...Show moreLast updated: 30+ days ago
    • Promoted
    Network Engineer

    Network Engineer

    TEKsystemsSanta Clara, CA, United States
    Full-time
    Reviewing network topologies designed by 3rd party manufacturing facility IT teams and identifying flaws in the design.Going onsite during the build of new facilities and standing up the IT server ...Show moreLast updated: 12 days ago
    • Promoted
    Senior Networking Design Engineer

    Senior Networking Design Engineer

    Innominds Software Private LimitedSAN JOSE, CA, US
    Full-time
    Role / Title : Senior Networking Design Engineer Location : Mountain View, California Duration : 12 Months Job Description : How You Will Contribute : Specify & Design network infrastructure including rou...Show moreLast updated: 30+ days ago
    • Promoted
    Principal Network Software and Solution Engineer - Switch Solutions (27648)

    Principal Network Software and Solution Engineer - Switch Solutions (27648)

    SupermicroSan Jose, CA, United States
    Full-time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show moreLast updated: 13 days ago