Talent.com
Software Engineer, Distributed Systems

Software Engineer, Distributed Systems

OpenAISan Francisco, CA, United States
30+ days ago
Job type
  • Full-time
Job description

Software Engineer, Distributed Systems | OpenAI

Inference - San Francisco

Apply now (opens in a new window)

About the Team

Our Inference team brings OpenAI’s most capable research and technology to the world through our products. We empower consumers, enterprises and developers alike to use and access our state‑of‑the‑art AI models, allowing them to do things that they’ve never been able to before. We focus on performant and efficient model inference, as well as accelerating research progression via model inference.

About the Role

We’re looking for a senior engineer to design and build the load balancer that will sit at the very front of our research inference stack—routing the world’s largest AI models with millisecond precision and bulletproof reliability. This system will serve research jobs where requests must stay “sticky” to the same model instance for hours or days and where even subtle errors can directly degrade model performance.

In this role, you will :

  • Architect and build the gateway / network load balancer that fronts all research jobs, ensuring long‑lived connections remain consistent and performant.
  • Design traffic stickiness and routing strategies that optimize for both reliability and throughput.
  • Instrument and debug complex distributed systems— with a focus on building world‑class observability and debuggability tools (distributed tracing, logging, metrics).
  • Collaborate closely with researchers and ML engineers to understand how infrastructure decisions impact model performance and training dynamics.
  • Own the end‑to‑end system lifecycle : from design and code to deploy, operate, and scale.
  • Work in an outcome‑oriented environment where everyone contributes across layers of the stack, from infra plumbing to performance tuning.

You might thrive in this role if you :

  • Have deep experience designing and operating large‑scale distributed systems, particularly load balancers, service gateways, or traffic routing layers.
  • Have 5+ years of experience designing in theory for and debugging in practice for the algorithmic and systems challenges of consistent hashing, sticky routing, and low‑latency connection management.
  • Have 5+ years of experience as a software engineer and systems architect working on high‑scale, high‑reliability infrastructure.
  • Have a strong debugging mindset and enjoy spending time in tracing, logs, and metrics to untangle distributed failures.
  • Are comfortable writing and reviewing production code in Rust or similar systems languages (C / C++, Java, Go, Zig, etc).
  • Have operated in big tech or high‑growth environments and are excited to apply that experience in a faster‑moving setting.
  • Take ownership of problems end‑to‑end and are excited to build something foundational to how our models interact with the world.
  • Nice to have :

  • Experience with gateway or load balancing systems (e.g., Envoy, gRPC, custom LB implementations).
  • Familiarity with inference workloads (e.g., reinforcement learning, streaming inference, KV cache management, etc).
  • Exposure to debugging and operational excellence practices in large production environments.
  • About OpenAI

    OpenAI is an AI research and deployment company dedicated to ensuring that general‑purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

    We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.

    For additional information, please see OpenAI’s Aff… Statement.

    Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers : we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment : protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non‑public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.

    To notify OpenAI that you believe this job posting is non‑compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance.

    We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this.

    OpenAI Global Applicant Privacy Policy

    At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

    Compensation

    $325K – $490K + Offers Equity

    Apply now (opens in a new window)

    #J-18808-Ljbffr

    Create a job alert for this search

    Software Engineer • San Francisco, CA, United States

    Related jobs
    • Promoted
    • New!
    IT Systems Engineer - East

    IT Systems Engineer - East

    Omada HealthSouth San Francisco, CA, United States
    Full-time
    Candidates must reside on the East Coast in the U.Omada Health is on a mission to inspire and engage people in lifelong health, one step at a time. As an IT Systems Engineer, you will play a critica...Show moreLast updated: 5 hours ago
    • Promoted
    Systems Engineers & Software Developers

    Systems Engineers & Software Developers

    Info Way SolutionsFremont, CA, US
    Full-time
    Systems Engineers & Software Developers Company : Info Way Solutions Location : Fremont, CA Position Type : Full Time Experience : See below for details Education : See below for details Systems Enginee...Show moreLast updated: 26 days ago
    • Promoted
    Controls Software Engineer

    Controls Software Engineer

    Lawrence Berkeley National LaboratoryBerkeley, CA, United States
    Full-time
    Berkeley Lab's Engineering Division is seeking an innovative and creative.Beamline Controls Group at the Advanced Light Source (ALS). The ALS is on the brink of an expansive equipment upgrade that w...Show moreLast updated: 30+ days ago
    • Promoted
    Software & Systems Engineer

    Software & Systems Engineer

    Diamond FoundrySan Francisco, CA, United States
    Full-time
    AI & cloud compute, electric-car power electronics, and 5G / 6G wireless.We have managed to produce the world's first single-crystal diamond wafers and are now on a mission to put a diamond behind ev...Show moreLast updated: 1 day ago
    • Promoted
    Senior System Software Engineer

    Senior System Software Engineer

    ChargePointCampbell, CA, United States
    Full-time
    With electric vehicles expected to be nearly 30% of new vehicle sales by 2025 and more than 50% by 2040, electric mobility is becoming a reality. ChargePoint (NYSE : CHPT) is at the center of this re...Show moreLast updated: 6 days ago
    • Promoted
    Software Engineer, Distributed Systems

    Software Engineer, Distributed Systems

    ReplitFoster City, California, United States
    Full-time
    Replit is the fastest way to turn ideas into software.With our powerful AI-powered Agent and Assistant, anyone can create and launch apps from natural language in just one click.Build and deploy fu...Show moreLast updated: 30+ days ago
    • Promoted
    Distributed Systems Software Engineer - Public Cloud (Mid / Senior / Lead / Principal)

    Distributed Systems Software Engineer - Public Cloud (Mid / Senior / Lead / Principal)

    salesforce.com, inc.San Francisco, CA, United States
    Full-time
    To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts. Salesforce is the #1 AI CRM, where humans with age...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Software Engineer : Distributed Systems, WARP

    Senior Software Engineer : Distributed Systems, WARP

    Cloudflare, Inc.San Francisco, CA, United States
    Full-time
    At Cloudflare, we are on a mission to help build a better Internet.Today the company runs one of the world's largest networks that powers millions of websites and other Internet properties for cust...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer (Distributed Systems)

    Software Engineer (Distributed Systems)

    Browserbase, Inc.San Francisco, CA, United States
    Full-time
    As a Software Engineer (Distributed Systems) at.You’ll ensure it is high performance, scalable, constantly evolving and growing, and that our customers. As a Distributed Systems Engineer at Browserb...Show moreLast updated: 30+ days ago
    • Promoted
    Systems / Software Engineer-III

    Systems / Software Engineer-III

    AbacusSan Francisco, CA, United States
    Full-time
    HQ, USA, CA, San Francisco, 2nd St.Senior Applications Engineer, Full Stack - People Applications [Remote] About the Team DoorDash's ability to build the best products in the industry, enabling loc...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Backend Engineer - Distributed Systems

    Senior Backend Engineer - Distributed Systems

    VerkadaSan Mateo, California, United States
    Full-time
    Designed with simplicity in mind, Verkada's six product lines — video security cameras, access control, environmental sensors, alarms, workplace, and intercoms — provide unparalleled building secur...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer, Distributed Systems

    Software Engineer, Distributed Systems

    HightouchSan Francisco, CA, United States
    Full-time
    Hightouch’s mission is to empower everyone to take action on their data.Hundreds of companies, including Autotrader, Calendly, Cars. PetSmart, trust Hightouch to power their growth.We pioneered the ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Infrastructure Linux & DevOps Engineer

    Senior Infrastructure Linux & DevOps Engineer

    Matrix Precise, Inc.Pleasanton, California, United States
    Full-time
    Infra Linux Engineer’s primary function will be to advance the infrastructure team from a traditional infrastructure methodology to an infrastructure as code approach. You will be responsible for ma...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer - Systems

    Software Engineer - Systems

    doeSan Francisco, CA, United States
    Full-time
    If you have personally made over $100,000 via trading (manual / algo) or have a github repo with 500+ stars, please email founders@doe. At Doe, we’re building an AI workforce that operates mission-c...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Software Engineer - Distributed Data Systems

    Senior Software Engineer - Distributed Data Systems

    DatabricksSan Francisco, CA, United States
    Full-time
    At Databricks, we are passionate about enabling data teams to solve the world's toughest problems - from making the next mode of transportation a reality to accelerating the development of medical ...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer, Systems

    Software Engineer, Systems

    EventualSan Francisco, CA, United States
    Full-time
    Every breakthrough AI application, from foundation models to autonomous vehicles, relies on processing massive volumes of images, video, and complex data. But today’s data platforms (like Databricks...Show moreLast updated: 23 days ago
    • Promoted
    Software Engineer, Distributed Systems

    Software Engineer, Distributed Systems

    OpenAISan Francisco, CA, United States
    Full-time
    The Compute Runtime team builds the low level framework components to power our ML training systems.We work on building robust, scalable, high performance components to support our distributed trai...Show moreLast updated: 30+ days ago
    • Promoted
    Distributed Systems Engineer, Build Infrastructure, Vehicle Software

    Distributed Systems Engineer, Build Infrastructure, Vehicle Software

    TeslaPalo Alto, CA, United States
    Full-time
    Software Engineer, Distributed Systems, Build Infrastructure at Tesla.Join to apply for the Software Engineer, Distributed Systems, Build Infrastructure role at Tesla. Get AI-powered advice on this ...Show moreLast updated: 4 days ago