Talent.com
Distributed ML Systems Engineer- Inference

Distributed ML Systems Engineer- Inference

Together AISan Francisco, CA, United States
30+ days ago
Job type
  • Full-time
Job description

About the Role

Together AI is seeking a Distributed ML Systems Engineer to design and build scalable machine learning systems that power our accelerated AI initiatives. This role involves developing large-scale, fault-tolerant distributed systems that handle high-load and high-performance requirements. If you are passionate about designing ML systems that operate at scale and eager to create impactful solutions, we want to hear from you. This position offers the chance to work closely with our AI researchers and infrastructure teams to ensure our systems are robust and efficient. Join us in shaping the future at Together AI!

Responsibilities

  • Design and build large-scale, distributed machine learning systems that are fault-tolerant and high-performance.
  • Develop and optimize distributed processing frameworks and storage systems.
  • Collaborate with researchers, engineers, and product managers to integrate ML systems into our infrastructure.
  • Conduct architecture and design reviews to ensure best practices in system design.
  • Implement robust monitoring and logging systems to ensure the health and performance of our ML systems.

Requirements

  • 3+ years of experience in building large-scale, fault-tolerant, high-performance distributed systems.
  • Strong programming skills in one or more of Python, Go, Rust, or C / C++.
  • Excellent understanding of low-level operating systems concepts including multi-threading, memory management, networking, and storage, performance, and scale.
  • Experience with cloud computing platforms (AWS, GCP, Azure etc.) and large-scale infrastructure.
  • Strong problem-solving skills and ability to work in a fast-paced environment.
  • Preferred : Experience with Kubernetes
  • Preferred : Experience with Pytorch
  • About Together AI

    Together AI is a research-drven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society. Together, we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI. Our team has been behind technological advancements such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey to build the next-generation AI infrastructure.

    Compensation

    We offer competitive compensation, startup equity, health insurance, and other competitive benefits. The US base salary range for this full-time position is $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level, and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

    Equal Opportunity

    Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunities to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

    Please see our privacy policy at

    Create a job alert for this search

    Ml Engineer • San Francisco, CA, United States

    Related jobs
    • Promoted
    • New!
    Distributed ML Systems Engineer- InferenceSan Francisco

    Distributed ML Systems Engineer- InferenceSan Francisco

    Together AISan Francisco, CA, United States
    Full-time
    Distributed ML Systems Engineer- Inference.Together AI is seeking a Distributed ML Systems Engineer to design and build scalable machine learning systems that power our accelerated AI initiatives.T...Show moreLast updated: 22 hours ago
    • Promoted
    Distributed LLM Inference Engineer

    Distributed LLM Inference Engineer

    Anyscale, IncSan Francisco, CA, United States
    Full-time
    At Anyscale, we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We're commercializing Ray, a popular open-source project that'...Show moreLast updated: 30+ days ago
    • Promoted
    ML Engineer

    ML Engineer

    PhizenixMenlo Park, CA, United States
    Full-time +1
    Client Opportunity | Through Phizenix.Phizenix, a certified minority and women-led recruiting firm, is hiring on behalf of an innovative generative AI startup that's developing diffusion-based larg...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Research Engineer, Enterprise ML Systems

    Machine Learning Research Engineer, Enterprise ML Systems

    Scale AI, Inc.San Francisco, CA, United States
    Full-time
    AI is becoming vitally important in every function of our society.At Scale, our mission is to accelerate the development of AI applications. For 9 years, Scale has been the leading AI data foundry, ...Show moreLast updated: 14 days ago
    • Promoted
    • New!
    LLM / ML Engineer (Inference)

    LLM / ML Engineer (Inference)

    ReductoSan Francisco, CA, United States
    Full-time
    We would love to meet you if you : .Philosophy : You are your own worst critic.You have a high bar for quality and don't rest until the job is done rightno settling for 90%. We want someone who ships f...Show moreLast updated: 22 hours ago
    • Promoted
    • New!
    Distributed LLM Inference Engineer

    Distributed LLM Inference Engineer

    AnyscaleSan Francisco, CA, United States
    Full-time
    At Anyscale, we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We're commercializing Ray, a popular open-source project that'...Show moreLast updated: 22 hours ago
    • Promoted
    ML Research Engineer, ML Systems

    ML Research Engineer, ML Systems

    Scale AISan Francisco, CA, United States
    Full-time
    Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and opera...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Software Engineer, ML Inference, Simulation Infrastructure

    Software Engineer, ML Inference, Simulation Infrastructure

    WaymoSan Francisco, CA, United States
    Full-time
    Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver.Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on buildin...Show moreLast updated: 22 hours ago
    • Promoted
    ML Systems Engineer

    ML Systems Engineer

    GenmoSan Francisco, CA, United States
    Full-time
    We are Genmo, a research lab dedicated to building open, state-of-the-art models for video generation towards unlocking the right brain of AGI. Join us in shaping the future of AI and pushing the bo...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer, Systems ML

    Software Engineer, Systems ML

    METAMenlo Park, CA, United States
    Full-time
    Meta), formerly known as Facebook Inc.When Facebook launched in 2004, it changed the way people connect.Apps and services like Messenger, Instagram, and WhatsApp further empowered billions around t...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Software Engineer - ML / LLM Inference

    Software Engineer - ML / LLM Inference

    AlldusSan Francisco, CA, United States
    Full-time
    Get AI-powered advice on this job and more exclusive features.Direct message the job poster from Alldus.Principal Recruitment Consultant | AI & Machine Learning | Co-organizer of the AI in Action P...Show moreLast updated: 22 hours ago
    • Promoted
    LLM Inference Frameworks and Optimization Engineer

    LLM Inference Frameworks and Optimization Engineer

    Together AISan Francisco, CA, United States
    Full-time
    Our mission is to optimize inference frameworks, algorithms, and infrastructure, pushing the boundaries of performance, scalability, and cost-efficiency. We are seeking anInference Frameworks and Op...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    ML Engineer [IC3]San Francisco, CA

    ML Engineer [IC3]San Francisco, CA

    SourcegraphSan Francisco, CA, United States
    Full-time
    Our mission at Sourcegraph is to make it so that everyone can code, not just ~0.We are transforming how the world's most important companies build software by industrializing development with AI.To...Show moreLast updated: 22 hours ago
    • Promoted
    Founding Engineer, ML Performance & Systems

    Founding Engineer, ML Performance & Systems

    Isotron AISan Francisco, CA, United States
    Full-time
    We’re an early-stage stealth startup building a new kind of platform for generative media.Our mission is to enable the future of real-time generative applications : we’re building the foundational t...Show moreLast updated: 30+ days ago
    • Promoted
    ML Research Engineer, ML Systems

    ML Research Engineer, ML Systems

    Scale AI, Inc.San Francisco, CA, United States
    Full-time
    Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and opera...Show moreLast updated: 30+ days ago
    • Promoted
    ML Ops Engineer

    ML Ops Engineer

    Omni InclusiveSan Leandro, CA, United States
    Full-time
    ML Ops Engineer to drive the full lifecycle of machine learning solutions-from data exploration and model development to scalable deployment and monitoring. This role bridges the gap between data sc...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Machine Learning Systems Infrastructure Engineer - SIML, ISE

    Senior Machine Learning Systems Infrastructure Engineer - SIML, ISE

    AppleCupertino, CA, United States
    Full-time
    Do you think Computer Vision and Machine Learning can change the world? Do you think it can transform the way millions of people collect, discover and share the most special moments of their lives?...Show moreLast updated: 5 days ago
    • Promoted
    Senior Applied AI Engineer - ML for Systems & Infrastructure

    Senior Applied AI Engineer - ML for Systems & Infrastructure

    DatabricksSan Francisco, CA, United States
    Full-time
    As a Senior Applied AI Engineer at Databricks, you will apply machine learning, scheduling and optimization algorithms to improve the efficiency and performance of our engineering systems and infra...Show moreLast updated: 30+ days ago