Talent.com
Software Engineer, Data Infrastructure - Research

Software Engineer, Data Infrastructure - Research

OpenAISan Francisco, CA, United States
30+ days ago
Job type
  • Full-time
Job description

About the Team

The Workload team is responsible for designing and running OpenAI’s LLM training and inference infrastructure that powers frontier models at massive scale. Our systems unify how researchers train and serve models, abstracting away the complexity of performance, parallelism, and execution across vast GPU / accelerator fleets. By providing this foundation, the Workload team ensures that researchers can focus on advancing model capabilities while we handle the scale, efficiency, and reliability required to bring those models to life.

About the Role

We are looking for an engineer to design and implement the dataset infrastructure that powers OpenAI’s next-generation training stack. You will be responsible for building standardized dataset interfaces, scaling pipelines across thousands of GPUs, and proactively testing performance bottlenecks. In this role, you will collaborate closely with the multimodal researchers, and other infra groups to ensure datasets are unified, efficient, and easy to consume.

In this role, you will :

Design and maintain standardized dataset APIs, including for multimodal (MM) data that cannot fit in memory.

Build proactive testing and scale validation pipelines for dataset loading at GPU scale.

Collaborate with teammates to integrate datasets seamlessly into training and inference pipelines, ensuring smooth adoption and a great user experience.

Document and maintain dataset interfaces so they are discoverable, consistent, and easy for other teams to adopt.

Establish safeguards and validation systems to ensure datasets remain reproducible and unchanged once standardized.

Debug and resolve performance bottlenecks in distributed dataset loading (e.g., straggler systems slowing global training).

Provide visualization and inspection tools to surface errors, bugs, or bottlenecks in datasets.

You might thrive in this role if you :

Have strong engineering fundamentals with experience in distributed systems, data pipelines, or infrastructure.

Have experience building APIs, modular code, and scalable abstractions, while recognizing that abstractions ultimately serve the users and UX is an important part of the abstractions design.

Are comfortable debugging bottlenecks across large fleets of machines.

Take pride in building infrastructure that “just works,” and find joy in being the guardian of reliability and scale.

Are collaborative, humble, and excited to own a foundational (if not glamorous) part of the ML stack.

Bonus points if you :

Have background knowledge in data math, probability, or distributed data theory.

Have worked with GPU-scale distributed systems or dataset scaling for real-time data

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.

For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement.

Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers : we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment : protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.

To notify OpenAI that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

OpenAI Global Applicant Privacy Policy

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

#J-18808-Ljbffr

Create a job alert for this search

Software Engineer Infrastructure • San Francisco, CA, United States

Related jobs
  • Promoted
Staff+ Software Engineer - Data Infrastructure

Staff+ Software Engineer - Data Infrastructure

AnthropicSan Francisco, CA, United States
Full-time
Staff+ Software Engineer - Data Infrastructure.Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society ...Show moreLast updated: 5 days ago
  • Promoted
Software Engineer, Distributed Data Systems (Sora)

Software Engineer, Distributed Data Systems (Sora)

OpenAISan Francisco, CA, United States
Full-time
Software Engineer, Distributed Data Systems (Sora).The Sora team is pioneering multimodal capabilities for OpenAI’s foundation models. We’re a hybrid research and product team focused on integrating...Show moreLast updated: 4 days ago
  • Promoted
Research Engineer (Data Infra / ML)

Research Engineer (Data Infra / ML)

StealthSan Francisco, CA, United States
Full-time
Research Engineer (Data Infra / ML).Can you build & optimize distributed ML pipelines with Ray or Spark?.Do you love speeding up cloud infra (Kubernetes, Docker, CI / CD)?. Excited to build the data bac...Show moreLast updated: 5 days ago
  • Promoted
Software Engineer, Research Infrastructure

Software Engineer, Research Infrastructure

OpenAISan Francisco, CA, United States
Full-time
Software Engineer, Research Infrastructure.This role will support the fleet infrastructure team at OpenAI.The fleet team focuses on running the world’s largest, most reliable, and frictionless GPU ...Show moreLast updated: 30+ days ago
  • Promoted
Senior Software Engineer – Foundational Data Systems for AI

Senior Software Engineer – Foundational Data Systems for AI

AgilesoftSan Francisco, CA, United States
Full-time
Senior Software Engineer – Foundational Data Systems for AI.Granica is an AI research and systems company building the infrastructure for a new kind of intelligence : one that is structured, efficie...Show moreLast updated: 5 days ago
  • Promoted
Software Engineer, Machine Learning Infrastructure

Software Engineer, Machine Learning Infrastructure

David AISan Francisco, California, United States
Full-time
David AI is the first audio data research company.We bring an R&D approach to data–developing datasets with the same rigor AI labs bring to models. Our mission is to bring AI into the real world, an...Show moreLast updated: 30+ days ago
  • Promoted
Software Engineer, Data Infrastructure - Research

Software Engineer, Data Infrastructure - Research

OpenAISan Francisco, CA, United States
Full-time
Software Engineer, Data Infrastructure - Research.Get AI-powered advice on this job and more exclusive features.The Workload team is responsible for designing and running OpenAI’s LLM training and ...Show moreLast updated: 5 days ago
  • Promoted
Software Engineer, Data Infrastructure

Software Engineer, Data Infrastructure

OpenAISan Francisco, CA, United States
Full-time
Data Platform at OpenAI owns the foundational data stack powering critical product, research, and analytics workflows.We operate some of the largest Spark compute fleets in production; design, and ...Show moreLast updated: 30+ days ago
  • Promoted
Senior Research Engineer - Datasets

Senior Research Engineer - Datasets

black.aiSan Francisco, CA, United States
Full-time
Join the team redefining how the world experiences design.Hey, hello, g'day, mabuhay, kia ora, 你好, hallo, vítejte!.We know job hunting can be a little time consuming and you're probably keen to fin...Show moreLast updated: 30+ days ago
  • Promoted
AI Infrastructure Engineer, Model Serving Platform

AI Infrastructure Engineer, Model Serving Platform

Scale AI, Inc.San Francisco, CA, United States
Full-time
As a Software Engineer on the ML Infrastructure team, you will design and build platforms for scalable, reliable, and efficient serving of LLMs. Our platform powers cutting-edge research and product...Show moreLast updated: 30+ days ago
  • Promoted
  • New!
Software Engineer — AI-Driven Research Systems & Cloud Infra

Software Engineer — AI-Driven Research Systems & Cloud Infra

IntologySan Francisco, CA, United States
Full-time
A tech-focused research firm in San Francisco is seeking talented individuals to help build and deploy end-to-end automated research systems. Collaborate with a core R&D team and contribute to produ...Show moreLast updated: 21 hours ago
  • Promoted
Software Engineer, Machine Learning Infrastructure

Software Engineer, Machine Learning Infrastructure

DatologyaiRedwood City, California, United States
Full-time
Companies want to train their own large models on their own data.The current industry standard is to train on a random sample of your data, which is inefficient at best and actively harmful to mode...Show moreLast updated: 30+ days ago
  • Promoted
Software Engineer, Infrastructure & Data

Software Engineer, Infrastructure & Data

LightfieldSan Francisco, CA, United States
Full-time
Software Engineer, Infrastructure & Data.Join Lightfield as a Software Engineer, Infrastructure & Data role to architect, build, and scale core infrastructure and data systems that power our AI-dri...Show moreLast updated: 30+ days ago
  • Promoted
Senior Software Engineer - Data Infrastructure

Senior Software Engineer - Data Infrastructure

PlaidSan Francisco, CA, United States
Full-time
Senior Software Engineer - Data Infrastructure.Making data driven decisions is key to Plaid's culture.To support that, we need to scale our data systems while maintaining correct and complete data....Show moreLast updated: 30+ days ago
  • Promoted
Software Engineer, Research - Human Data

Software Engineer, Research - Human Data

OpenAISan Francisco, CA, United States
Full-time
OpenAI's mission is to ensure that artificial general intelligence (AGI) benefits all of humanity.A key part of achieving that mission is training models that deeply understand and reflect human pr...Show moreLast updated: 30+ days ago
  • Promoted
Research Engineer, Codex

Research Engineer, Codex

OpenAISan Francisco, CA, United States
Full-time
The Codex team is responsible for building state-of-the-art AI systems that can write code, reason about software, and act as intelligent agents for developers and non-developers alike.Our mission ...Show moreLast updated: 30+ days ago
  • Promoted
Senior Software Engineer - Distributed Data Systems

Senior Software Engineer - Distributed Data Systems

DatabricksSan Francisco, CA, United States
Full-time
At Databricks, we are passionate about enabling data teams to solve the world's toughest problems - from making the next mode of transportation a reality to accelerating the development of medical ...Show moreLast updated: 30+ days ago
  • Promoted
Software Engineer, Infrastructure & Data

Software Engineer, Infrastructure & Data

LIGHTFIELD INCSan Francisco, CA, United States
Full-time
Lightfield is an AI-native CRM that assembles itself from your email, calendar, and meetings.It captures every interaction and turns it into organized context : accounts, tasks, follow-ups, and insi...Show moreLast updated: 30+ days ago