Talent.com
Machine Learning Systems Engineer
Machine Learning Systems EngineerMenlo Ventures • Berkeley, CA, United States
Machine Learning Systems Engineer

Machine Learning Systems Engineer

Menlo Ventures • Berkeley, CA, United States
2 days ago
Job type
  • Full-time
Job description

Who We Are

At RelationalAI, we are building the future of intelligent data systems through our cloud-native relational knowledge graph management system—a platform designed for learning, reasoning, and prediction.

We are a remote-first, globally distributed team with colleagues across six continents. From day one, we’ve embraced asynchronous collaboration and flexible schedules, recognizing that innovation doesn’t follow a 9-to-5.

We are committed to an open, transparent, and inclusive workplace. We value the unique backgrounds of every team member and believe in fostering a culture of respect, curiosity, and innovation. We support each other’s growth and success—and take the well‑being of our colleagues seriously. We encourage everyone to find a healthy balance that affords them a productive, happy life, wherever they choose to live.

We bring together engineers who love building core infrastructure, obsess over developer experience, and want to make complex systems scalable, observable, and reliable.

Machine Learning Systems Engineer

Location : Remote (San Francisco Bay Area / North America / South America)

Experience Level : 3+ years of experience in machine learning engineering or research

About ScalarLM

This role will involve heavily working with the ScalarLM framework and team.

ScalarLM unifies vLLM, Megatron-LM, and HuggingFace for fast LLM training, inference, and self‑improving agents—all via an OpenAI‑compatible interface. ScalarLM builds on top of the vLLM inference engine, the Megatron‑LM training framework, and the HuggingFace model hub. It unifies the capabilities of these tools into a single platform, enabling users to easily perform LLM inference and training, and build higher‑lever applications such as Agents with a twist - they can teach themselves new abilities via back propagation.

ScalarLM is inspired by the work of Seymour Roger Cray (September 28, 1925 – October 5, 1996), an American electrical engineer and supercomputer architect who designed a series of computers that were the fastest in the world for decades, and founded Cray Research, which built many of these machines. Called "the father of supercomputing", Cray has been credited with creating the supercomputer industry.

It is a fully open source project (CC‑0 Licensed) focused on democratizing access to cutting‑edge LLM infrastructure that combines training and inference in a unified platform, enabling the development of self‑improving AI agents similar to DeepSeek R1.

ScalarLM is supported and maintained by TensorWave in addition to RelationalAI.

The Role

As a Machine Learning Engineer, you will contribute directly to our machine learning infrastructure, to the ScalarLM open source codebase, and build large‑scale language model applications on top of it. You’ll operate at the intersection of high-performance computing, distributed systems, and cutting‑edge machine learning research, developing the fundamental infrastructure that enables researchers and organizations worldwide to train and deploy large language models at scale.

This is an opportunity to take on technically demanding projects, contribute to foundational systems, and help shape the next generation of intelligent computing.

You Will

  • Contribute code and performance improvements to the open source project.
  • Develop and optimize distributed training algorithms for large language models.
  • Implement high‑performance inference engines and optimization techniques.
  • Work on integration between vLLM, Megatron‑LM, and HuggingFace ecosystems.
  • Build tools for seamless model training, fine‑tuning, and deployment.
  • Optimize performance of advanced GPU architectures.
  • Collaborate with the open source community on feature development and bug fixes.
  • Research and implement new techniques for self‑improving AI agents.

Who You Are

Technical Skills

  • Programming Languages : Proficiency in both C / C++ and Python
  • High Performance Computing : Deep understanding of HPC concepts, including :

  • MPI (Message Passing Interface) programming and optimization
  • Bulk Synchronous Parallel (BSP) computing models
  • Multi‑GPU and multi‑node distributed computing
  • CUDA / ROCm programming experience preferred
  • Machine Learning Foundations :

  • Solid understanding of gradient descent and backpropagation algorithms
  • Experience with transformer architectures and the ability to explain their mechanics
  • Knowledge of deep learning training and its applications
  • Understanding of distributed training techniques (data parallelism, model parallelism, pipeline parallelism, large batch training, optimization)
  • Research and Development

  • Publications : Experience with machine learning research and publications preferred
  • Research Skills : Ability to read, understand, and implement techniques from recent ML research papers
  • Open Source : Demonstrated commitment to open source development and community collaboration
  • Experience

  • 3+ years of experience in machine learning engineering or research.
  • Experience with large-scale distributed training frameworks (Megatron‑LM, DeepSpeed, FairScale, etc.).
  • Familiarity with inference optimization frameworks (vLLM, TensorRT, etc.).
  • Experience with containerization (Docker, Kubernetes) and cluster management.
  • Background in systems programming and performance optimization.
  • Bonus points if :

  • PhD or MS in Computer Science, Computer Engineering, Machine Learning, or related field.
  • Experience with SLURM, Kubernetes, or other cluster orchestration systems.
  • Knowledge of mixed precision training, data parallel training, and scaling laws.
  • Experience with transformer architecture, pytorch, decoding algorithms.
  • Familiarity with high performance GPU programming ecosystem.
  • Previous contributions to major open source ML projects.
  • Experience with MLOps and model deployment at scale.
  • Understanding of modern attention mechanisms (multi‑head attention, grouped query attention, etc.).
  • Why RelationalAI

    RelationalAI is committed to an open, transparent, and inclusive workplace. We value the unique backgrounds of our team. We are driven by curiosity, value innovation, and help each other to succeed and to grow. We take the well‑being of our colleagues seriously, and offer flexible working hours so each individual can find a healthy balance that affords them a productive, happy life wherever they choose to live.

    🌎 Global Benefits at RelationalAI

    At RelationalAI, we believe that people do their best work when they feel supported, empowered, and balanced. Our benefits prioritize well‑being, flexibility, and growth, ensuring you have the resources to thrive both professionally and personally.

  • We are all owners in the company and reward you with a competitive salary and equity.
  • Work from anywhere in the world.
  • Comprehensive benefits coverage, including global mental health support
  • Open PTO – Take the time you need, when you need it.
  • Company Holidays, Your Regional Holidays, and RAI Holidays—where we take one Monday off each month, followed by a week without recurring meetings, giving you the time and space to recharge.
  • Paid parental leave – Supporting new parents as they grow their families.
  • We invest in your learning & development
  • Regular team offsites and global events – Building strong connections while working remotely through team offsites and global events that bring everyone together.
  • A culture of transparency & knowledge‑sharing – Open communication through team standups, fireside chats, and open meetings.
  • Country Hiring Guidelines

    RelationalAI hires around the world. All of our roles are remote; however, some locations might carry specific eligibility requirements.

    Because of this, understanding location & visa support helps us better prepare to onboard our colleagues.

    Our People Operations team can help answer any questions about location after starting the recruitment process.

    Privacy Policy

    EU residents applying for positions at RelationalAI can see our Privacy Policy here.

    California residents applying for positions at RelationalAI can see our Privacy Policy here.

    Equal Opportunity

    RelationalAI is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, color, gender identity or expression, marital status, national origin, disability, protected veteran status, race, religion, pregnancy, sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances.

    #J-18808-Ljbffr

    Create a job alert for this search

    Machine Learning Engineer • Berkeley, CA, United States

    Related jobs
    Machine Learning Engineer

    Machine Learning Engineer

    Atlassian • San Francisco, CA, United States
    Full-time
    Atlassians can choose where they work – whether in an office, from home, or a combination of the two.That way, Atlassians have more control over supporting their family, personal goals, and other p...Show more
    Last updated: 7 days ago • Promoted
    Machine Learning Engineer, GenAI Applied ML

    Machine Learning Engineer, GenAI Applied ML

    Scale AI, Inc. • San Francisco, CA, United States
    Full-time
    At Scale AI, our mission is to accelerate the development of AI applications.For 8 years, Scale has been the leading AI data foundry, helping fuel the most exciting advancements in AI, including : g...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Systems Platform Engineer

    Machine Learning Systems Platform Engineer

    Blue Signal • San Francisco, CA, United States
    Full-time
    Confidential Opening : Machine Learning Systems Platform Engineer.San Francisco, CA (Hybrid Preferred).A stealth-mode innovator at the forefront of AI infrastructure is seeking a dynamic Machine Lea...Show more
    Last updated: 28 days ago • Promoted
    Machine Learning Systems Engineer, RL Engineering

    Machine Learning Systems Engineer, RL Engineering

    The Rundown AI, Inc. • San Francisco, CA, United States
    Full-time
    You want to build the cutting-edge systems that train AI models like Claude.You're excited to work at the frontier of machine learning, implementing and improving advanced techniques to create ever...Show more
    Last updated: 7 days ago • Promoted
    Machine Learning System Engineer Shanghai, Beijing, Shenzhen

    Machine Learning System Engineer Shanghai, Beijing, Shenzhen

    Meshy LLC. • Berkeley, CA, United States
    Full-time
    We are looking for Machine Learning Systems Engineers who can help us build the world's largest end-to-end 3D native machine learning systems. You will help us build our end to end ML framework dedi...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Greylock Partners • San Francisco, CA, United States
    Full-time
    Machine Learning Infrastructure Engineer — join early B2C investment to help build large-scale ML infrastructure for a cutting-edge AI-first mobile product. Founders have experience building iconic ...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Ambience Healthcare • San Francisco, CA, United States
    Full-time
    Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer. Machine Learning Infrastructure Engineer.Ambience Healthcare is the leadin...Show more
    Last updated: 30+ days ago • Promoted
    Staff Machine Learning Engineer - Systems

    Staff Machine Learning Engineer - Systems

    EvenUp • San Francisco, CA, United States
    Full-time
    EvenUp is on a mission to close the justice gap using technology and AI.We empower personal injury lawyers and victims to get the justice they deserve. Our products enable law firms to secure faster...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Engineer, Foundation Model

    Machine Learning Engineer, Foundation Model

    Stripe • San Francisco, CA, United States
    Full-time
    Machine Learning Engineer, Foundation Model.Machine Learning Engineer, Foundation Model.Stripe’s mission is to accelerate global economic and technological development. We offer financial infrastruc...Show more
    Last updated: 4 days ago • Promoted
    Machine Learning Systems Engineer, RL Engineering

    Machine Learning Systems Engineer, RL Engineering

    Menlo Ventures • San Francisco, CA, United States
    Full-time
    Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Machine Learning Engineer (Recommendation Systems)

    Sr. Machine Learning Engineer (Recommendation Systems)

    Philo • San Francisco, CA, United States
    Full-time
    At Philo, we’re a group of technology and product people who set out to build the future of television, marrying the best in modern technology with the most compelling medium ever invented.We lever...Show more
    Last updated: 30+ days ago • Promoted
    ML Research Engineer, ML Systems

    ML Research Engineer, ML Systems

    Scale AI, Inc. • San Francisco, CA, United States
    Full-time
    Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and opera...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Systems Engineer, Research Tools

    Machine Learning Systems Engineer, Research Tools

    The Rundown AI, Inc. • San Francisco, CA, United States
    Full-time
    We are seeking an experienced Machine Learning Systems Engineer to join our Encodings and Tokenization team at Anthropic. This cross-functional role will be instrumental in developing and optimizing...Show more
    Last updated: 7 days ago • Promoted
    Senior Machine Learning Engineer (Alameda)

    Senior Machine Learning Engineer (Alameda)

    Harnham • Alameda, CA, US
    Part-time
    SENIOR MACHINE LEARNING ENGINEER - SEARCH & RECOMMENDATIONS.Hybrid Bay Area (3 Days / Week Onsite).Were a fast-growing online marketplace backed by a major global tech player.Our platform helps mill...Show more
    Last updated: 9 days ago • Promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    Ipro Networks Pte. Ltd. • San Francisco, CA, United States
    Full-time
    Job Title : Machine Learning Engineer, Training Infrastructure | Position Type : Full time | Location : San Francisco, CA, USA | Salary Range : $150,000 - $250,000 (USD) | Job ID# : 158135.Design, imple...Show more
    Last updated: 29 days ago • Promoted
    Machine Learning Systems Engineer, Research Tools

    Machine Learning Systems Engineer, Research Tools

    Anthropic • San Francisco, CA, United States
    Full-time
    Machine Learning Systems Engineer, Research Tools.Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for socie...Show more
    Last updated: 30+ days ago • Promoted
    Research Engineer - Machine Learning & Systems

    Research Engineer - Machine Learning & Systems

    World Labs • San Francisco, CA, United States
    Full-time
    We are looking for a versatile Research Engineer with a strong background in machine learning or 3D, software development, and systems design. This role is ideal for someone excited about bridging c...Show more
    Last updated: 30+ days ago • Promoted
    Founding Machine Learning Engineer

    Founding Machine Learning Engineer

    NomadicML Inc. • San Francisco, CA, United States
    Full-time
    Harvard, where they both did research in the intersection of computation and evaluations.Between them, they have authored multiple published papers in the machine learning domain and hold numerous ...Show more
    Last updated: 30+ days ago • Promoted