Talent.com
Software Engineer, Data Infrastructure - Research

Software Engineer, Data Infrastructure - Research

OpenAISan Francisco, CA, United States
6 days ago
Job type
  • Full-time
Job description

Software Engineer, Data Infrastructure - Research

Get AI-powered advice on this job and more exclusive features.

About The Team

The Workload team is responsible for designing and running OpenAI’s LLM training and inference infrastructure that powers frontier models at massive scale. Our systems unify how researchers train and serve models, abstracting away the complexity of performance, parallelism, and execution across vast GPU / accelerator fleets. By providing this foundation, the Workload team ensures that researchers can focus on advancing model capabilities while we handle the scale, efficiency, and reliability required to bring those models to life.

About The Role

We are looking for an engineer to design and implement the dataset infrastructure that powers OpenAI’s next-generation training stack. You will be responsible for building standardized dataset interfaces, scaling pipelines across thousands of GPUs, and proactively testing performance bottlenecks. In this role, you will collaborate closely with the multimodal researchers and other infra groups to ensure datasets are unified, efficient, and easy to consume.

In This Role, You Will

  • Design and maintain standardized dataset APIs, including for multimodal (MM) data that cannot fit in memory.
  • Build proactive testing and scale validation pipelines for dataset loading at GPU scale.
  • Collaborate with teammates to integrate datasets seamlessly into training and inference pipelines, ensuring smooth adoption and a great user experience.
  • Document and maintain dataset interfaces so they are discoverable, consistent, and easy for other teams to adopt.
  • Establish safeguards and validation systems to ensure datasets remain reproducible and unchanged once standardized.
  • Debug and resolve performance bottlenecks in distributed dataset loading (e.g., straggler systems slowing global training).
  • Provide visualization and inspection tools to surface errors, bugs, or bottlenecks in datasets.

You Might Thrive In This Role If You

  • Have strong engineering fundamentals with experience in distributed systems, data pipelines, or infrastructure.
  • Have experience building APIs, modular code, and scalable abstractions, while recognizing that abstractions ultimately serve the users and UX is an important part of the abstractions design.
  • Are comfortable debugging bottlenecks across large fleets of machines.
  • Take pride in building infrastructure that “just works,” and find joy in being the guardian of reliability and scale.
  • Are collaborative, humble, and excited to own a foundational (if not glamorous) part of the ML stack.
  • Bonus Points If You

  • Have background knowledge in data math, probability, or distributed data theory.
  • Have worked with GPU‑scale distributed systems or dataset scaling for real‑time data.
  • About OpenAI

    OpenAI is an AI research and deployment company dedicated to ensuring that general‑purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

    We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.

    For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement .

    Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers : we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment : protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non‑public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.

    To notify OpenAI that you believe this job posting is non‑compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance.

    We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

    OpenAI Global Applicant Privacy Policy

    At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

    Compensation Range : $250K - $380K

    #J-18808-Ljbffr

    Create a job alert for this search

    Software Engineer Infrastructure • San Francisco, CA, United States

    Related jobs
    • Promoted
    Staff+ Software Engineer - Data Infrastructure

    Staff+ Software Engineer - Data Infrastructure

    AnthropicSan Francisco, CA, United States
    Full-time
    Staff+ Software Engineer - Data Infrastructure.Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society ...Show moreLast updated: 6 days ago
    • Promoted
    Software Engineer, Distributed Data Systems (Sora)

    Software Engineer, Distributed Data Systems (Sora)

    OpenAISan Francisco, CA, United States
    Full-time
    Software Engineer, Distributed Data Systems (Sora).The Sora team is pioneering multimodal capabilities for OpenAI’s foundation models. We’re a hybrid research and product team focused on integrating...Show moreLast updated: 4 days ago
    • Promoted
    Research Engineer (Data Infra / ML)

    Research Engineer (Data Infra / ML)

    StealthSan Francisco, CA, United States
    Full-time
    Research Engineer (Data Infra / ML).Can you build & optimize distributed ML pipelines with Ray or Spark?.Do you love speeding up cloud infra (Kubernetes, Docker, CI / CD)?. Excited to build the data bac...Show moreLast updated: 6 days ago
    • Promoted
    Software Engineer, Research Infrastructure

    Software Engineer, Research Infrastructure

    OpenAISan Francisco, CA, United States
    Full-time
    Software Engineer, Research Infrastructure.This role will support the fleet infrastructure team at OpenAI.The fleet team focuses on running the world’s largest, most reliable, and frictionless GPU ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Software Engineer, Data Infrastructure (RDBMS)

    Senior Software Engineer, Data Infrastructure (RDBMS)

    TRM LabsSan Francisco, CA, United States
    Full-time
    Senior or Staff Software Engineer, Database Engineer.Senior or Staff Software Engineer, Database Engineer.TRM Labs is a blockchain intelligence company committed to fighting crime and creating a sa...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Software Engineer - AI Agent Infrastructure (Healthcare)

    Senior Software Engineer - AI Agent Infrastructure (Healthcare)

    Honey HealthFremont, CA, United States
    Full-time
    Honey Health is the all-in-one AI back office for primary and specialty care.Our AI agents autonomously handle core back-office jobs, such as aggregating patients data, processing orders and prescr...Show moreLast updated: 8 days ago
    • Promoted
    Software Engineer, Data Infrastructure

    Software Engineer, Data Infrastructure

    DatologyAIRedwood City, CA, United States
    Full-time
    But a large portion of training compute is wasted training on data that are already learned, irrelevant, or even harmful, leading to worse models that cost more to train and deploy.At DatologyAI, w...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Research Engineer - Datasets

    Senior Research Engineer - Datasets

    black.aiSan Francisco, CA, United States
    Full-time
    Join the team redefining how the world experiences design.Hey, hello, g'day, mabuhay, kia ora, 你好, hallo, vítejte!.We know job hunting can be a little time consuming and you're probably keen to fin...Show moreLast updated: 30+ days ago
    • Promoted
    Software Infrastructure & Platform Engineer

    Software Infrastructure & Platform Engineer

    PsiQuantumPalo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer — AI-Driven Research Systems & Cloud Infra

    Software Engineer — AI-Driven Research Systems & Cloud Infra

    IntologySan Francisco, CA, United States
    Full-time
    A tech-focused research firm in San Francisco is seeking talented individuals to help build and deploy end-to-end automated research systems. Collaborate with a core R&D team and contribute to produ...Show moreLast updated: 1 day ago
    • Promoted
    Software Engineer, Search Infrastructure

    Software Engineer, Search Infrastructure

    OpenAISan Francisco, CA, United States
    Full-time
    Software Engineer, Search Infrastructure.Applied AI Engineering – San Francisco, CA.We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.The ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Software Engineer - Data Infrastructure

    Senior Software Engineer - Data Infrastructure

    PlaidSan Francisco, CA, United States
    Full-time
    Senior Software Engineer - Data Infrastructure.Making data driven decisions is key to Plaid's culture.To support that, we need to scale our data systems while maintaining correct and complete data....Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer, Research - Human Data

    Software Engineer, Research - Human Data

    OpenAISan Francisco, CA, United States
    Full-time
    OpenAI's mission is to ensure that artificial general intelligence (AGI) benefits all of humanity.A key part of achieving that mission is training models that deeply understand and reflect human pr...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer, Infrastructure & Data

    Software Engineer, Infrastructure & Data

    LightfieldSan Francisco, CA, United States
    Full-time
    Software Engineer, Infrastructure & Data.Join Lightfield as a Software Engineer, Infrastructure & Data role to architect, build, and scale core infrastructure and data systems that power our AI-dri...Show moreLast updated: 30+ days ago
    • Promoted
    Infrastructure Software Engineer, Public Sector

    Infrastructure Software Engineer, Public Sector

    Scale AI, Inc.San Francisco, CA, United States
    Full-time
    Scale AI is seeking a highly skilled and motivated.Software Engineer, AI Infrastructure & Security.Public Sector Engineering team. As a part of this team, you will play a critical role in delivering...Show moreLast updated: 30+ days ago
    • Promoted
    Research Engineer, Codex

    Research Engineer, Codex

    OpenAISan Francisco, CA, United States
    Full-time
    The Codex team is responsible for building state-of-the-art AI systems that can write code, reason about software, and act as intelligent agents for developers and non-developers alike.Our mission ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Software Engineer - Distributed Data Systems

    Senior Software Engineer - Distributed Data Systems

    DatabricksSan Francisco, CA, United States
    Full-time
    At Databricks, we are passionate about enabling data teams to solve the world's toughest problems - from making the next mode of transportation a reality to accelerating the development of medical ...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer, Infrastructure & Data

    Software Engineer, Infrastructure & Data

    LIGHTFIELD INCSan Francisco, CA, United States
    Full-time
    Lightfield is an AI-native CRM that assembles itself from your email, calendar, and meetings.It captures every interaction and turns it into organized context : accounts, tasks, follow-ups, and insi...Show moreLast updated: 30+ days ago