Research Engineer - Distributed Training

Prime IntellectSan Francisco, CA, United States

30+ days ago

Job type

Full-time

Job description

Building Open Superintelligence Infrastructure

Prime Intellect is building the open superintelligence stack - from frontier agentic models to the infra that enables anyone to create, train, and deploy them. We aggregate and orchestrate global compute into a single control plane and pair it with the full rl post-training stack : environments, secure sandboxes, verifiable evals, and our async RL trainer. We enable researchers, startups and enterprises to run end-to-end reinforcement learning at frontier scale, adapting models to real tools, workflows, and deployment contexts.

As a Research Engineer working on Distributed Training, you'll play a crucial role in shaping our technological direction, focusing on our decentralizing AI training stack. If you love scaling things and maximizing training efficiency, this role is for you.

Responsibilities

Lead and participate in novel research to build a massive scale, highly reliable and secure decentralized training orchestration solution

Optimize the performance, cost, and resource utilization of AI workloads by leveraging the most recent advances for compute & memory optimization techniques.

Contribute to the development of our open-source libraries and frameworks for distributed model training.

Publish research in top-tier AI conferences such as ICML & NeurIPS.

Distill highly technical project outcomes in layman approachable technical blogs to our customers and developers.

Stay up-to-date with the latest advancements in AI / ML infrastructure and tools, decentralized training research and proactively identify opportunities to enhance our platform's capabilities and user experience.

Requirements

Strong background in AI / ML engineering, with extensive experience in designing and implementing end-to-end pipelines for training and deploying large-scale AI models.

Deep expertise in distributed training techniques, frameworks (e.g., PyTorch Distributed, DeepSpeed, MosaicML’s LLM Foundry), and tools (e.g. Ray) for optimizing the performance and scalability of AI workloads.

Experience in large-scale model training incl. distributed training techniques such as data, tensor & pipeline parallelism

Solid understanding of MLOps best practices, including model versioning, experiment tracking, and continuous integration / deployment (CI / CD) pipelines.

Passion for advancing the state-of-the-art in decentralized AI model training and democratizing access to AI capabilities for researchers, developers, and businesses worldwide.

If you're not familiar with these, but feel like that you can contribute to our mission and you're a high-energy person, get familiar with these resources (here, here and here) and please reach out!

Benefits & Perks

Competitive compensation, including equity incentives, aligning your success with the growth and impact of Prime Intellect.

Flexible work arrangements, with the option to work remotely or in-person at our offices in San Francisco.

Visa sponsorship and relocation assistance for international candidates.

Quarterly team off-sites, hackathons, conferences and learning opportunities.

Opportunity to work with a talented, hard-working and mission-driven team, united by a shared passion for leveraging technology to accelerate science and AI.

We recently raised $15mm in funding (total of $20mm raised) led by Founders Fund, with participation from Menlo Ventures and prominent angels including Andrej Karpathy (Eureka AI, Tesla, OpenAI), Tri Dao (Chief Scientific Officer of Together AI), Dylan Patel (SemiAnalysis), Clem Delangue (Huggingface), Emad Mostaque (Stability AI) and many others.

If you're excited about the opportunity to build the foundation for the future of decentralized AI and create a platform that empowers developers and researchers to push the boundaries of what's possible, we'd love to hear from you.

#J-18808-Ljbffr

Create a job alert for this search

Engineer Distributed • San Francisco, CA, United States

Related jobs

Promoted

Research Engineer, Pre-training

AnthropicSan Francisco, CA, United States

Full-time

Research Engineer, Pre-training.Get AI-powered advice on this job and more exclusive features.Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be saf...Show moreLast updated: 4 days ago

Promoted

Research Engineer

Solcoa IndustriesSan Francisco, CA, United States

Full-time

This range is provided by Solcoa Industries.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Making the Metals Powering the World.Solcoa exists t...Show moreLast updated: 1 day ago

Promoted

ML Research Engineer - Training

AchiraSan Francisco, CA, United States

Full-time

Join a world‑class team of scientists, ML researchers, and engineers working together to make the physical microcosm predictable and reshape the future of drug discovery. Move beyond the beaten path...Show moreLast updated: 30+ days ago

Promoted

Research Engineer, Notifcations

OpenAISan Francisco, CA, United States

Full-time

The ChatGPT team works across research, engineering, product, and design to bring OpenAI's technology to the world.We seek to learn from deployment and broadly distribute the benefits of AI, while ...Show moreLast updated: 30+ days ago

Promoted

Research Engineer, ML Systems (All Industry Levels)

Character.AISan Francisco, CA, United States

Full-time

Research Engineer, ML Systems (All Industry Levels).Research Engineer, ML Systems (All Industry Levels).Research Engineer, ML Systems (All Industry Levels). Research Engineer, ML Systems (All Indust...Show moreLast updated: 30+ days ago

Promoted

Research Engineer, Training Infrastructure

GoodfireSan Francisco, CA, United States

Full-time

Behind our name : Like fire, AI holds the potential for both immense benefit and significant risk.Just as mastering fire transformed human history, we believe the safe and intentional development of...Show moreLast updated: 30+ days ago

Promoted

Research Engineer - ML Sys & Infra

Storm3San Francisco, CA, United States

Full-time

This range is provided by Storm3.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Direct message the job poster from Storm3.Research Engineer - M...Show moreLast updated: 4 days ago

Promoted

ML Research Engineer, ML Systems

Scale AI, Inc.San Francisco, CA, United States

Full-time

Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and opera...Show moreLast updated: 30+ days ago

Promoted

Reinforcement Learning Research Engineer

Strativ GroupAlameda, CA, US

Full-time

Reinforcement Learning Research Engineer.A scaling, SOTA Generative AI Startup operating with a world class team (Founders have multiple prior exits) with talent from Open AI, IBM, MIT and several ...Show moreLast updated: 1 day ago

Promoted

Research Engineer

OpenAISan Francisco, CA, United States

Full-time

By applying to this role, you will be considered for Research Engineer roles across all teams at OpenAI.As a Research Engineer here, you will be responsible for building AI systems that can perform...Show moreLast updated: 30+ days ago

Promoted

Research Engineer

GeneralagentsSan Francisco, CA, United States

Full-time

General Agents is an applied research lab exploring the frontiersof autonomous intelligence.Our mission is to liberate humanity from digital labor. We are a team of researchers, engineers, and opera...Show moreLast updated: 30+ days ago

Promoted

Research Engineer (Pre-training)

HartleyCoSan Francisco, CA, United States

Full-time

This range is provided by HartleyCo.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Our client is assembling a world-class team to push the boun...Show moreLast updated: 30+ days ago

Promoted

Researcher Engineer / Scientist, Training

OpenAISan Francisco, CA, United States

Full-time

OpenAI's Training team is responsible for producing the large language models that power our research, our products, and ultimately bring us closer to AGI. Achieving this goal requires combining dee...Show moreLast updated: 30+ days ago

Promoted

Research Engineer

SolcoaSan Francisco, CA, United States

Full-time

Making the Metals Powering the World.Solcoa exists to stabilize the western rare-earth metal supply chain—powering every fighter jet, EV, wind turbine, phone, and generator.We’re among the very few...Show moreLast updated: 30+ days ago

Promoted

Research Engineer, Training Infrastructure Lead

Menlo VenturesSan Francisco, CA, United States

Full-time

Promoted

Research Engineer

MagicSan Francisco, CA, United States

Full-time

Get AI-powered advice on this job and more exclusive features.Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems.We believe the most pr...Show moreLast updated: 4 days ago

Promoted

Research Engineer, Training Infrastructure Lead

GoodfireSan Francisco, CA, United States

Full-time

Research Engineer, Training Infrastructure Lead.Behind our name : Like fire, AI holds the potential for both immense benefit and significant risk. Just as mastering fire transformed human history, we...Show moreLast updated: 30+ days ago

Promoted

Research Engineer / Scientist, Robustness & Safety Training

OpenAISan Francisco, CA, United States

Full-time

Research Engineer / Scientist, Robustness & Safety Training.Safety Systems - San Francisco.The Safety Systems team is responsible for various safety work to ensure our best models can be safely dep...Show moreLast updated: 30+ days ago