Talent.com
Research Engineer - ML Sys & Infra

Research Engineer - ML Sys & Infra

Storm3San Francisco, CA, United States
5 days ago
Job type
  • Full-time
Job description

This range is provided by Storm3. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.

Base pay range

$200,000.00 / yr - $350,000.00 / yr

Direct message the job poster from Storm3

⚡ Research Engineer - ML Sys, Infra Optimization and Scaling

Come join a revolutionary AI research lab in SF Bay Area that is poised to develop & publish high-impact breakthroughs across LLMs, RL and Multimodal AI.

Opportunity to join as an early member in a rapidly growing team, led by A-listers in tech & academia. Seeking Engineers / Researchers skilled in distributed training, inference optimization and scaling laws for large-scale deep learning clusters and infrastructure .

Responsibilities

  • Research & implement SOTA methods in high performance parallel computing
  • Set up distributed training frameworks and extend AI optimizer frameworks
  • Write production-quality code for ML infra; ensure reliability at scale
  • Work collaboratively in a early-stage team to enhance multi-GPU training set-up

Requirements

  • MS or PhD in Comp Sci, Comp Engineering, or related
  • Multi-node (Ray, Kubernetes) and distributed inference optimization experience
  • Experience working with CUDA and deep learning optimization in HPC environment

  • Proficiency in leveraging high compute GPU clusters
  • Why apply

  • Opportunity to build out a new division at the forefront of AI innovation
  • FAANG competitive salary & package
  • Work alongside superstars from FAANG labs & leading AI companies
  • Medical, Dental and Vision Insurance
  • 📧 Interested in applying? Please click on the ‘Easy Apply’ button or alternatively email me your resume at anir.gantugs@storm3.com

    #J-18808-Ljbffr

    Create a job alert for this search

    Ml Engineer • San Francisco, CA, United States

    Related jobs
    • Promoted
    Senior Applied AI Engineer – ML for Systems & Infrastructure

    Senior Applied AI Engineer – ML for Systems & Infrastructure

    Databricks Inc.San Francisco, CA, United States
    Full-time
    Senior Applied AI Engineer – ML for Systems & Infrastructure.The Applied AI team at Databricks sits at the forefront of advancing GenAI-powered products. Over the past years, we’ve launched Databric...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Research Engineer, Enterprise ML Systems

    Machine Learning Research Engineer, Enterprise ML Systems

    Scale AI, Inc.San Francisco, CA, United States
    Full-time
    AI is becoming vitally important in every function of our society.At Scale, our mission is to accelerate the development of AI applications. For 9 years, Scale has been the leading AI data foundry, ...Show moreLast updated: 25 days ago
    • Promoted
    Machine Learning Research Engineer - Robotics

    Machine Learning Research Engineer - Robotics

    Scale AI, Inc.San Francisco, CA, United States
    Full-time
    Scale's Robotics business unit is dedicated to solving the data bottleneck in Physical AI.This position will be a key contributor in conducting applied research in Robotics and developing ML pipeli...Show moreLast updated: 30+ days ago
    • Promoted
    Research Engineer, ML Systems (All Industry Levels)

    Research Engineer, ML Systems (All Industry Levels)

    Character.AISan Francisco, CA, United States
    Full-time
    Research Engineer, ML Systems (All Industry Levels).Research Engineer, ML Systems (All Industry Levels).Research Engineer, ML Systems (All Industry Levels). Research Engineer, ML Systems (All Indust...Show moreLast updated: 30+ days ago
    • Promoted
    Research Engineer Machine Learning & Systems

    Research Engineer Machine Learning & Systems

    World LabsSan Francisco, California, United States
    Full-time
    We are looking for a versatile Research Engineer with a strong background in machine learning or 3D, software development, and systems design. This role is ideal for someone excited about bridging c...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Research Engineer - Robotics

    Machine Learning Research Engineer - Robotics

    Scale AiSan Francisco, California, United States
    Full-time
    Scale’s Robotics business unit is dedicated to solving the data bottleneck in Physical AI.This position will be a key contributor in conducting applied research in Robotics and developing ML pipeli...Show moreLast updated: 30+ days ago
    • Promoted
    Distributed ML Systems Engineer- Inference

    Distributed ML Systems Engineer- Inference

    Together AISan Francisco, CA, United States
    Full-time
    Together AI is seeking a Distributed ML Systems Engineer to design and build scalable machine learning systems that power our accelerated AI initiatives. This role involves developing large-scale, f...Show moreLast updated: 30+ days ago
    • Promoted
    ML Research Engineer, ML Systems

    ML Research Engineer, ML Systems

    Scale AI, Inc.San Francisco, CA, United States
    Full-time
    Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and opera...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Systems Engineer, Encodings and Tokenization

    Machine Learning Systems Engineer, Encodings and Tokenization

    AnthropicSan Francisco, California, United States
    Full-time
    Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer, Machine Learning Infrastructure

    Software Engineer, Machine Learning Infrastructure

    DatologyaiRedwood City, California, United States
    Full-time
    Companies want to train their own large models on their own data.The current industry standard is to train on a random sample of your data, which is inefficient at best and actively harmful to mode...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Engineer - Intelligent Agents & Systems

    Machine Learning Engineer - Intelligent Agents & Systems

    ZyphraPalo Alto, California, United States
    Full-time
    Agentic Systems and Interaction projects.You will be at the forefront of building a next-generation desktop and browser-based agent that can autonomously navigate the web, interact with filesystems...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Data Engineer - Systems & Retrieval

    Machine Learning Data Engineer - Systems & Retrieval

    ZyphraPalo Alto, California, United States
    Full-time
    Machine Learning Data Engineer - Systems & Retrieval.This includes designing high-performance pipelines for collecting, transforming, indexing, and serving massive, heterogeneous datasets from raw ...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer L4 / L5, Model Serving Systems, Machine Learning Platform

    Software Engineer L4 / L5, Model Serving Systems, Machine Learning Platform

    NetflixLos Gatos, California, United States
    Full-time
    Netflix is one of the world's leading entertainment services, with 283 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and lan...Show moreLast updated: 30+ days ago
    • Promoted
    ML Research Engineer, ML Systems Research San Francisco, CA

    ML Research Engineer, ML Systems Research San Francisco, CA

    Scale AI, Inc.San Francisco, CA, United States
    Full-time
    Join the team shaping the future of AI at Scale.Scale’s ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been power...Show moreLast updated: 1 day ago
    • Promoted
    Machine Learning Engineer - Collision Avoidance System

    Machine Learning Engineer - Collision Avoidance System

    ZooxFoster City, California, United States
    Full-time
    The Collision Avoidance System (CAS) is responsible for detecting and reacting to imminent collision situations in support of our vehicle’s overall safety goals. CAS Perception is responsible for pr...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Machine Learning Operations (MLOps) Engineer

    Senior Machine Learning Operations (MLOps) Engineer

    Bonfy-aiMountain View, California, United States
    Full-time
    AI is building the trust layer for generative AI.Our Adaptive Content Security platform detects and mitigates subtle risks embedded in large language model (LLM) outputs before they reach users.Fro...Show moreLast updated: 30+ days ago
    • Promoted
    ML Research Engineer, ML Systems

    ML Research Engineer, ML Systems

    Scale AISan Francisco, CA, United States
    Full-time
    Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and opera...Show moreLast updated: 30+ days ago
    • Promoted
    Research Engineer - Machine Learning & Systems

    Research Engineer - Machine Learning & Systems

    World LabsSan Francisco, CA, United States
    Full-time
    We are looking for a versatile Research Engineer with a strong background in machine learning or 3D, software development, and systems design. This role is ideal for someone excited about bridging c...Show moreLast updated: 30+ days ago