Talent.com
Infrastructure Engineering - Traffic
Infrastructure Engineering - TrafficxAI • San Francisco, CA, United States
Infrastructure Engineering - Traffic

Infrastructure Engineering - Traffic

xAI • San Francisco, CA, United States
30+ days ago
Job type
  • Full-time
Job description

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

In this role, you will be a key contributor to xAI’s Supercomputing team, focusing on building and optimizing scalable, high-performance traffic platforms that power our production inference engines. You will work on critical systems that manage traffic flow, service discovery, and network reliability across both on-premise and cloud-based Kubernetes clusters. Collaborating closely with Network Fabric Engineers and other technical teams, you will drive projects that enhance the stability and efficiency of our AI infrastructure, including support for large-scale training runs for advanced models like Grok 4 and beyond. This role demands deep technical expertise in Kubernetes, L4 / L7 proxies like Envoy, and service discovery systems, along with a proactive approach to debugging and optimizing complex network performance issues from L3 to L7.

What you’ll do

  • Build and optimize traffic platforms that automate and simplify the lifecycle of production inference engines across dozens of on-premise and cloud clusters, managing core traffic primitives like load balancing, routing, overload control, authentication / authorization, encryption in transit.
  • Manage, extend, and optimize xAI’s production inference capabilities with L4 / L7 proxies such as Envoy, NGINX.
  • Manage and extend xAI’s Service Discovery systems, both in and outside of Kubernetes (DNS, xDS control planes).
  • Collaborate with Network Fabric Engineers to improve host networking + fabric stability for large scale training runs (ie Grok 4 and beyond).
  • Work with a fast, small technical team to execute projects in the critical path of xAI.

What we’d like to see

  • 2+ years of experience operating Kubernetes clusters, or experience writing + deploying controllers.
  • 2+ years of experience configuring and deploying Envoy, NGINX, HAProxy, or some other L7 software load balancer.
  • 1+ years of experience deploying and configuring kubernetes CNI plugins (Calico, Cilium, Flannel) or experience with IPAM.
  • 1+ years of experience with DNS systems (ex : CoreDNS, Unbound) or service discovery control planes (xDS)
  • 1+ years of experience with cloud networking primitives (VPC Route Tables, Cloud NAT, Peering / Transit Gateways, CDN, Cloudflare Workers or equivalent)
  • Experience with host level network proxies (iptables, nftables, IPVS, eBPF programs) is a plus.
  • Deep experience with gRPC Client libraries (grpcio / grpc-go / grpc-java) is a plus.
  • Experience with service mesh (Istio, Linkerd) is a plus.
  • Demonstrated experience in working with Kubernetes and Envoy internals – can you tell us how k8s cached clients work? Can you tell us how Envoy scales and manages state?
  • Demonstrated experience debugging performance and reliability issues that span from L3 to L7 (ex : how would a gRPC client in a cloud environment call a gRPC server in an on-prem server? Describe the entire network path and any issues to watch out for, including Service Discovery / DNS, gRPC channel management, egress proxies, VPC routing, peering / PNI, edge caching / CDN, L4 loadbalancing devices, host networking + virtualization, k8s networking, L7 routing, TLS / authnz, TCP / IP)
  • Location

    This role is based in the Bay Area (San Francisco and Palo Alto). Candidates are expected to be located near the Bay Area or open to relocation.

    Envoy / xDS

    Golang and Rust

    Interview Process

    Application Review : Submit your CV and a statement of exceptional work. Our team will review your application to assess fit.

    Phone Interview (45 minutes) : A brief conversation with a team member to discuss your background, key accomplishments, and motivation.

    Main Interview Process

  • 2 Coding Assessments : Solve problems in a language of your choice.
  • Systems Hands-On : Demonstrate practical skills in a live problem-solving session.
  • Project Deep-Dive : Present your past exceptional work to a small audience.
  • Annual Salary Range

    $180,000 - $440,000 USD

    Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

    Note

    We welcome a variety of formats, such as public writings, presentations, or publications. Submission is optional but highly encouraged.

    #J-18808-Ljbffr

    Create a job alert for this search

    Traffic Engineering • San Francisco, CA, United States

    Related jobs
    Forward Deployed Infrastructure Engineer

    Forward Deployed Infrastructure Engineer

    Hyperbolic Labs • San Francisco, CA, United States
    Full-time
    Hyperbolic Labs is on a mission to democratize AI by breaking down the barriers to computing power with our Open-Access AI Cloud. By making better use of idle computing resources across the globe, w...Show more
    Last updated: 11 days ago • Promoted
    Remote Blockchain Infrastructure Engineer - MVUP

    Remote Blockchain Infrastructure Engineer - MVUP

    MVUP • San Francisco, CA, United States
    Remote
    Full-time
    This is the first opportunity to join a core team that’s excited about software security and reimagining the tools required for building groundbreaking applications. We believe that Ethereum will co...Show more
    Last updated: 30+ days ago • Promoted
    Principal DevOps Engineer

    Principal DevOps Engineer

    Informatica LLC • Redwood City, CA, United States
    Full-time
    Build Your Career at Informatica.We seek innovative thinkers who believe in the power of data to drive meaningful change. At Informatica, we welcome adventurous minds eager to solve the world's most...Show more
    Last updated: 30+ days ago • Promoted
    Staff Infrastructure Engineer

    Staff Infrastructure Engineer

    Ironclad • San Francisco, California, United States
    Remote
    Full-time
    Ironclad is the #1 contract lifecycle management platform for innovative companies.Every company, in every country, in every industry runs on contracts, but managing these contracts slows companies...Show more
    Last updated: 30+ days ago • Promoted
    Senior Traffic Engineer – Supervising Traffic Engineer

    Senior Traffic Engineer – Supervising Traffic Engineer

    AECOM • Oakland, California, USA
    Full-time
    Senior Traffic Engineer Supervising Traffic Engineer.Summary of Responsibilities : .Senior technical resource may serve as technical advisor for team. Provides specialized technical input to st...Show more
    Last updated: 16 days ago • Promoted
    Infrastructure Platform Engineer

    Infrastructure Platform Engineer

    NS IT Solutions • San Francisco, California, USA
    Full-time
    Title : Infrastructure / Platform Engineer (AI Voice & Social Product) - w / Equity.Location : San Francisco CA (onsite 5 days a week). As a Founding Infrastructure / Platform Engineer oversee cloud da...Show more
    Last updated: 13 days ago • Promoted
    Staff Infrastructure Engineer

    Staff Infrastructure Engineer

    Replit • Foster City, California, United States
    Full-time
    Replit is the agentic software creation platform that enables anyone to build applications using natural language.With millions of users worldwide and over 500,000 business users, Replit is democra...Show more
    Last updated: 30+ days ago • Promoted
    Founding Infrastructure Engineer

    Founding Infrastructure Engineer

    Adaption • San Francisco, CA, United States
    Full-time
    Founding Infrastructure Engineer.We believe the future is adaptable, and not one‑size‑fits‑all.We will lead in real‑time efficient adaptation that combines algorithm with innovative interface desig...Show more
    Last updated: 11 days ago • Promoted
    Platform & Infrastructure Engineer

    Platform & Infrastructure Engineer

    Mindsdb • San Francisco, California, United States
    Full-time
    MindsDB is a fast-growing AI startup headquartered in San Francisco, California.MindsDB is an AI Analytics solution that connects to diverse data sources and applications then unifies structured an...Show more
    Last updated: 30+ days ago • Promoted
    Lead Platform Engineer (Network Infrastructure)

    Lead Platform Engineer (Network Infrastructure)

    Capital One • San Francisco, CA, United States
    Full-time +1
    Lead Platform Engineer (Network Infrastructure).Do you love building and pioneering in the technology space? Do you enjoy solving complex technical problems in a fast-paced, collaborative, inclusiv...Show more
    Last updated: 11 days ago • Promoted
    Infrastructure and Platform Engineer

    Infrastructure and Platform Engineer

    DevOps projects • San Francisco, CA, United States
    Full-time
    Infrastructure and Platform Engineer.Vultron is bringing general intelligence to government contracting.As an early member of the team, you’ll be part of a transformative company from its early sta...Show more
    Last updated: 7 days ago • Promoted
    Infrastructure Engineer

    Infrastructure Engineer

    Tempo • San Francisco, CA, United States
    Full-time
    Tempo is a layer-1 blockchain purpose-built for stablecoins and real-world payments, born from Stripe’s experience in global payments and Paradigm’s expertise in crypto tech.Tempo’s payment-first d...Show more
    Last updated: 11 days ago • Promoted
    Infrastructure Platform Engineer

    Infrastructure Platform Engineer

    Fieldguide • San Francisco, California, USA
    Full-time
    Fieldguide is establishing a new state of trust for global commerce and capital markets through automating and streamlining the work of assurance and audit practitioners specifically within cyberse...Show more
    Last updated: 5 hours ago • Promoted • New!
    Infrastructure Engineer

    Infrastructure Engineer

    Langchain • San Francisco, CA, United States
    Full-time
    At LangChain, our mission is to make intelligent agents ubiquitous.We provide the agent engineering platform and open source frameworks developers need to ship reliable agents fast.Our open source ...Show more
    Last updated: 30+ days ago • Promoted
    Traffic Engineer

    Traffic Engineer

    Aceolution • San Francisco, CA, United States
    Full-time
    Job Description : Transit City Manager.The team works on keeping Maps Data current and reflective of real-world changes.They work on issues submitted by end users of Maps or test the location result...Show more
    Last updated: 18 days ago • Promoted
    Lead Infrastructure Engineer

    Lead Infrastructure Engineer

    PIP Labs • San Francisco, California, United States
    Full-time
    Story aims to grow the creativity of the internet.The internet has introduced Story is building the IP infrastructure for the internet era, where creativity and intelligence move at the speed of cu...Show more
    Last updated: 30+ days ago • Promoted
    Staff Infrastructure Engineer

    Staff Infrastructure Engineer

    Epoch Biodesign • San Francisco, CA, United States
    Full-time
    San Francisco, CA - US, Sunnyvale, CA - US.Crusoe's mission is to accelerate the abundance of energy and intelligence.We’re crafting the engine that powers a world where people can create ambitious...Show more
    Last updated: 18 hours ago • Promoted • New!
    Infrastructure Engineer

    Infrastructure Engineer

    LangChain • San Francisco, CA, United States
    Full-time
    At LangChain, our mission is to make intelligent agents ubiquitous.We provide the agent engineering platform and open source frameworks developers need to ship reliable agents fast.Our open source ...Show more
    Last updated: 30+ days ago • Promoted