Talent.com
Infrastructure Engineering - Traffic
Infrastructure Engineering - TrafficxAI • San Francisco, CA, United States
Infrastructure Engineering - Traffic

Infrastructure Engineering - Traffic

xAI • San Francisco, CA, United States
30+ days ago
Job type
  • Full-time
Job description

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

In this role, you will be a key contributor to xAI’s Supercomputing team, focusing on building and optimizing scalable, high-performance traffic platforms that power our production inference engines. You will work on critical systems that manage traffic flow, service discovery, and network reliability across both on-premise and cloud-based Kubernetes clusters. Collaborating closely with Network Fabric Engineers and other technical teams, you will drive projects that enhance the stability and efficiency of our AI infrastructure, including support for large-scale training runs for advanced models like Grok 4 and beyond. This role demands deep technical expertise in Kubernetes, L4 / L7 proxies like Envoy, and service discovery systems, along with a proactive approach to debugging and optimizing complex network performance issues from L3 to L7.

What you’ll do

  • Build and optimize traffic platforms that automate and simplify the lifecycle of production inference engines across dozens of on-premise and cloud clusters, managing core traffic primitives like load balancing, routing, overload control, authentication / authorization, encryption in transit.
  • Manage, extend, and optimize xAI’s production inference capabilities with L4 / L7 proxies such as Envoy, NGINX.
  • Manage and extend xAI’s Service Discovery systems, both in and outside of Kubernetes (DNS, xDS control planes).
  • Collaborate with Network Fabric Engineers to improve host networking + fabric stability for large scale training runs (ie Grok 4 and beyond).
  • Work with a fast, small technical team to execute projects in the critical path of xAI.

What we’d like to see

  • 2+ years of experience operating Kubernetes clusters, or experience writing + deploying controllers.
  • 2+ years of experience configuring and deploying Envoy, NGINX, HAProxy, or some other L7 software load balancer.
  • 1+ years of experience deploying and configuring kubernetes CNI plugins (Calico, Cilium, Flannel) or experience with IPAM.
  • 1+ years of experience with DNS systems (ex : CoreDNS, Unbound) or service discovery control planes (xDS)
  • 1+ years of experience with cloud networking primitives (VPC Route Tables, Cloud NAT, Peering / Transit Gateways, CDN, Cloudflare Workers or equivalent)
  • Experience with host level network proxies (iptables, nftables, IPVS, eBPF programs) is a plus.
  • Deep experience with gRPC Client libraries (grpcio / grpc-go / grpc-java) is a plus.
  • Experience with service mesh (Istio, Linkerd) is a plus.
  • Demonstrated experience in working with Kubernetes and Envoy internals – can you tell us how k8s cached clients work? Can you tell us how Envoy scales and manages state?
  • Demonstrated experience debugging performance and reliability issues that span from L3 to L7 (ex : how would a gRPC client in a cloud environment call a gRPC server in an on-prem server? Describe the entire network path and any issues to watch out for, including Service Discovery / DNS, gRPC channel management, egress proxies, VPC routing, peering / PNI, edge caching / CDN, L4 loadbalancing devices, host networking + virtualization, k8s networking, L7 routing, TLS / authnz, TCP / IP)
  • Location

    This role is based in the Bay Area (San Francisco and Palo Alto). Candidates are expected to be located near the Bay Area or open to relocation.

    Envoy / xDS

    Golang and Rust

    Interview Process

    Application Review : Submit your CV and a statement of exceptional work. Our team will review your application to assess fit.

    Phone Interview (45 minutes) : A brief conversation with a team member to discuss your background, key accomplishments, and motivation.

    Main Interview Process

  • 2 Coding Assessments : Solve problems in a language of your choice.
  • Systems Hands-On : Demonstrate practical skills in a live problem-solving session.
  • Project Deep-Dive : Present your past exceptional work to a small audience.
  • Annual Salary Range

    $180,000 - $440,000 USD

    Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

    Note

    We welcome a variety of formats, such as public writings, presentations, or publications. Submission is optional but highly encouraged.

    #J-18808-Ljbffr

    Create a job alert for this search

    Traffic Engineering • San Francisco, CA, United States

    Related jobs
    Remote Blockchain Infrastructure Engineer - MVUP

    Remote Blockchain Infrastructure Engineer - MVUP

    MVUP • San Francisco, CA, United States
    Remote
    Full-time
    This is the first opportunity to join a core team that’s excited about software security and reimagining the tools required for building groundbreaking applications. We believe that Ethereum will co...Show more
    Last updated: 30+ days ago • Promoted
    Cloud Infrastructure Engineer

    Cloud Infrastructure Engineer

    Braintrust • San Francisco, CA, United States
    Full-time
    Braintrust is building the modern platform for evaluating and deploying AI systems.Our mission is to help enterprises build trust in their AI by making it easy to test, monitor, and improve models ...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Software Engineer, Traffic Infrastructure

    Sr. Software Engineer, Traffic Infrastructure

    Genesis10 • Sunnyvale, CA, United States
    Permanent
    Genesis10 is currently seeking a Sr.Software Engineer, Traffic Infrastructure with our client in their Sunnyvale, CA location. This is a 6 month + contract remote position.HTTP request that reaches ...Show more
    Last updated: 23 hours ago • Promoted
    Platform & Infrastructure Engineer

    Platform & Infrastructure Engineer

    MindsDB • San Francisco, CA, United States
    Full-time
    Retrieved from the description.MindsDB is a fast-growing AI startup headquartered in San Francisco, California.MindsDB is an AI Analytics solution that connects to diverse data sources and applicat...Show more
    Last updated: 9 days ago • Promoted
    Founding Infrastructure Engineer

    Founding Infrastructure Engineer

    Adaption • San Francisco, CA, United States
    Full-time
    Founding Infrastructure Engineer.We believe the future is adaptable, and not one‑size‑fits‑all.We will lead in real‑time efficient adaptation that combines algorithm with innovative interface desig...Show more
    Last updated: 9 days ago • Promoted
    Senior Traffic Engineer

    Senior Traffic Engineer

    BENEN (Bennett Engineering Services Inc) • Milpitas, CA, United States
    Full-time
    BENEN (Bennett Engineering Services Inc).Be among the first 25 applicants.BENEN (Bennett Engineering Services Inc).Get AI-powered advice on this job and more exclusive features.BENEN is seeking a S...Show more
    Last updated: 30+ days ago • Promoted
    Infrastructure Engineer

    Infrastructure Engineer

    Mercor • San Francisco, CA, United States
    Full-time
    Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Mercor is training models that predict how well someone will perform on a job better than a huma...Show more
    Last updated: 6 days ago • Promoted
    Lead Platform Engineer (Network Infrastructure)

    Lead Platform Engineer (Network Infrastructure)

    Capital One • San Jose, CA, United States
    Full-time +1
    Lead Platform Engineer (Network Infrastructure).Do you love building and pioneering in the technology space? Do you enjoy solving complex technical problems in a fast-paced, collaborative, inclusiv...Show more
    Last updated: 2 days ago • Promoted
    Infrastructure Engineer

    Infrastructure Engineer

    Langchain • San Francisco, CA, United States
    Full-time
    At LangChain, our mission is to make intelligent agents ubiquitous.We provide the agent engineering platform and open source frameworks developers need to ship reliable agents fast.Our open source ...Show more
    Last updated: 12 days ago • Promoted
    Cloud Infrastructure Engineer

    Cloud Infrastructure Engineer

    Brain Trust Inc • San Francisco, CA, United States
    Full-time
    Braintrust is the AI observability platform.By connecting evals and observability in one workflow, Braintrust gives builders the visibility to understand how AI behaves in production and the tools ...Show more
    Last updated: 11 days ago • Promoted
    Lead Infrastructure Engineer - Remote

    Lead Infrastructure Engineer - Remote

    Commerce • San Francisco, CA, United States
    Remote
    Full-time
    Lead Infrastructure Engineer - Remote at Commerce.This role is open for remote work within U.Commerce is the parent company of BigCommerce, Feedonomics, and Makeswift, and we connect the tools and ...Show more
    Last updated: 30+ days ago • Promoted
    Global Network Deployment & Infrastructure Leader

    Global Network Deployment & Infrastructure Leader

    Cloudflare, Inc. • San Francisco, CA, United States
    Full-time
    A global cloud infrastructure company is seeking an Engineering Director in San Francisco to lead a team responsible for operating a large cloud network. The role requires 8+ years of experience in ...Show more
    Last updated: 2 days ago • Promoted
    Infrastructure and Platform Engineer

    Infrastructure and Platform Engineer

    DevOps projects • San Francisco, CA, United States
    Full-time
    Infrastructure and Platform Engineer.Vultron is bringing general intelligence to government contracting.As an early member of the team, you’ll be part of a transformative company from its early sta...Show more
    Last updated: 5 days ago • Promoted
    Infrastructure Engineering - Traffic

    Infrastructure Engineering - Traffic

    xAI • Palo Alto, CA, United States
    Full-time
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show more
    Last updated: 30+ days ago • Promoted
    Infrastructure Engineer

    Infrastructure Engineer

    Tempo • San Francisco, CA, United States
    Full-time
    Tempo is a layer-1 blockchain purpose-built for stablecoins and real-world payments, born from Stripe’s experience in global payments and Paradigm’s expertise in crypto tech.Tempo’s payment-first d...Show more
    Last updated: 19 days ago • Promoted
    Senior Traffic Engineer

    Senior Traffic Engineer

    Bennett Engineering Services • Milpitas, CA, United States
    Full-time
    BENEN is seeking a Senior Traffic Engineer to join and help lead our expanding team in the Bay Area.This position offers the opportunity to work on various public infrastructure and private develop...Show more
    Last updated: 28 days ago • Promoted
    Sr. Software Engineer - Traffic Infrastructure

    Sr. Software Engineer - Traffic Infrastructure

    Avispa Technology • Sunnyvale, CA, United States
    Full-time
    Software Engineer - Traffic Infrastructure 16273.Hourly pay : $50-$75 / hr (Pay varies based on the candidate's experience and location). Worksite : Leading professional development and networking compa...Show more
    Last updated: 1 day ago • Promoted
    Infrastructure Engineer

    Infrastructure Engineer

    LangChain • San Francisco, CA, United States
    Full-time
    At LangChain, our mission is to make intelligent agents ubiquitous.We provide the agent engineering platform and open source frameworks developers need to ship reliable agents fast.Our open source ...Show more
    Last updated: 30+ days ago • Promoted