Talent.com
Distributed ML Systems Engineer- Inference

Distributed ML Systems Engineer- Inference

Together AISan Francisco, CA, United States
30+ days ago
Job type
  • Full-time
Job description

About the Role

Together AI is seeking a Distributed ML Systems Engineer to design and build scalable machine learning systems that power our accelerated AI initiatives. This role involves developing large-scale, fault-tolerant distributed systems that handle high-load and high-performance requirements. If you are passionate about designing ML systems that operate at scale and eager to create impactful solutions, we want to hear from you. This position offers the chance to work closely with our AI researchers and infrastructure teams to ensure our systems are robust and efficient. Join us in shaping the future at Together AI!

Responsibilities

  • Design and build large-scale, distributed machine learning systems that are fault-tolerant and high-performance.
  • Develop and optimize distributed processing frameworks and storage systems.
  • Collaborate with researchers, engineers, and product managers to integrate ML systems into our infrastructure.
  • Conduct architecture and design reviews to ensure best practices in system design.
  • Implement robust monitoring and logging systems to ensure the health and performance of our ML systems.

Requirements

  • 3+ years of experience in building large-scale, fault-tolerant, high-performance distributed systems.
  • Strong programming skills in one or more of Python, Go, Rust, or C / C++.
  • Excellent understanding of low-level operating systems concepts including multi-threading, memory management, networking, and storage, performance, and scale.
  • Experience with cloud computing platforms (AWS, GCP, Azure etc.) and large-scale infrastructure.
  • Strong problem-solving skills and ability to work in a fast-paced environment.
  • Preferred : Experience with Kubernetes
  • Preferred : Experience with Pytorch
  • About Together AI

    Together AI is a research-drven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society. Together, we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI. Our team has been behind technological advancements such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey to build the next-generation AI infrastructure.

    Compensation

    We offer competitive compensation, startup equity, health insurance, and other competitive benefits. The US base salary range for this full-time position is $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level, and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

    Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunities to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

    #J-18808-Ljbffr

    Create a job alert for this search

    Ml Engineer • San Francisco, CA, United States

    Related jobs
    • Promoted
    Senior Applied AI Engineer – ML for Systems & Infrastructure

    Senior Applied AI Engineer – ML for Systems & Infrastructure

    Databricks Inc.San Francisco, CA, United States
    Full-time
    Senior Applied AI Engineer – ML for Systems & Infrastructure.The Applied AI team at Databricks sits at the forefront of advancing GenAI-powered products. Over the past years, we’ve launched Databric...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Research Engineer, Enterprise ML Systems

    Machine Learning Research Engineer, Enterprise ML Systems

    Scale AI, Inc.San Francisco, CA, United States
    Full-time
    AI is becoming vitally important in every function of our society.At Scale, our mission is to accelerate the development of AI applications. For 9 years, Scale has been the leading AI data foundry, ...Show moreLast updated: 24 days ago
    • Promoted
    Software Engineer, Distributed Systems

    Software Engineer, Distributed Systems

    ReplitFoster City, California, United States
    Full-time
    Replit is the fastest way to turn ideas into software.With our powerful AI-powered Agent and Assistant, anyone can create and launch apps from natural language in just one click.Build and deploy fu...Show moreLast updated: 30+ days ago
    • Promoted
    Applied ML / LLM Engineer

    Applied ML / LLM Engineer

    PincitesSan Francisco, CA, United States
    Full-time
    We’re looking for a sharp, ambitious.AI-native products — someone who knows how to turn messy real-world data into performant models, fine-tune and deploy LLMs, and design feedback loops that make ...Show moreLast updated: 4 days ago
    • Promoted
    Distinguished Engineer, Machine Learning Systems – Economy

    Distinguished Engineer, Machine Learning Systems – Economy

    RobloxSan Mateo, CA, United States
    Full-time
    Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all created by our global community of developers...Show moreLast updated: 1 day ago
    • Promoted
    Sr Machine Learning Engineer, ML Systems Engineering

    Sr Machine Learning Engineer, ML Systems Engineering

    AppleCupertino, CA, United States
    Full-time
    Apple’s products combine the best hardware and incredible software to deliver magical experiences to our customers.The Proactive Intelligence team builds features that anticipate customer’s needs a...Show moreLast updated: 4 days ago
    • Promoted
    Software Engineer - ML Performance

    Software Engineer - ML Performance

    BasetenSan Ramon, California, United States
    Full-time
    We’re a growing team of builders backed by top-tier investors, including.ML teams at enterprises and category-defining AI-native companies like. Baseten to power their core production workloads with...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer - ML / LLM Inference

    Software Engineer - ML / LLM Inference

    AlldusSan Francisco, CA, United States
    Full-time
    Get AI-powered advice on this job and more exclusive features.Direct message the job poster from Alldus.Principal Recruitment Consultant | AI & Machine Learning | Co-organizer of the AI in Action P...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Character.AIMenlo Park, CA, United States
    Full-time
    We’re looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research. Provide infrastructure support to our ...Show moreLast updated: 23 days ago
    • Promoted
    Senior Machine Learning Operations (MLOps) and Infrastructure Engineer

    Senior Machine Learning Operations (MLOps) and Infrastructure Engineer

    ASunnyvale, California, United States
    Full-time
    Our Wayfinder team is building scalable, certifiable autonomy systems to power the next generation of commercial aircraft. Our team of experts is driving the maturation of machine learning and other...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Systems Engineer

    Machine Learning Systems Engineer

    AtlassianSan Francisco, CA, United States
    Full-time
    Engineering | San Francisco, United States | Remote, Remote |.Atlassians can choose where they work – whether in an office, from home, or a combination of the two. That way, Atlassians have more con...Show moreLast updated: 7 days ago
    • Promoted
    Founding Engineer (Systems + ML)

    Founding Engineer (Systems + ML)

    PartclSan Francisco, CA, United States
    Full-time
    Founding Engineer (Systems + ML).Get AI-powered advice on this job and more exclusive features.This range is provided by Partcl. Your actual pay will be based on your skills and experience — talk wi...Show moreLast updated: 30+ days ago
    • Promoted
    ML Research Engineer, ML Systems

    ML Research Engineer, ML Systems

    Scale AI, Inc.San Francisco, CA, United States
    Full-time
    Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and opera...Show moreLast updated: 30+ days ago
    • Promoted
    ML Infrastructure Engineer

    ML Infrastructure Engineer

    PhizenixMenlo Park, California, United States
    Full-time +1
    Menlo Park, CA | On-Site | Full-Time / Direct Hire.Client Opportunity | Through Phizenix.Phizenix, a certified minority and women-led recruiting firm, is hiring on behalf of an AI startup pioneering ...Show moreLast updated: 30+ days ago
    • Promoted
    Machine Learning Systems Engineer, RL Engineering

    Machine Learning Systems Engineer, RL Engineering

    AnthropicSan Francisco, CA, United States
    Full-time
    Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...Show moreLast updated: 30+ days ago
    • Promoted
    Data Engineer - Multimodal Systems

    Data Engineer - Multimodal Systems

    ZyphraPalo Alto, California, United States
    Full-time
    Data Engineer - Multimodal Systems.Zyphra’s datasets and data pipelines across a variety of modalities.Your work will intersect with almost every team at Zyphra. You will be involved in collecting l...Show moreLast updated: 30+ days ago
    • Promoted
    Principal AI / ML System Software Engineer

    Principal AI / ML System Software Engineer

    d-MatrixSanta Clara, CA, United States
    Full-time
    AI to power the transformation of technology.We are at the forefront of software and hardware innovation, pushing the boundaries of what is possible. We value humility and believe in direct communic...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    ML Research Engineer, ML Systems Research San Francisco, CA

    ML Research Engineer, ML Systems Research San Francisco, CA

    Scale AI, Inc.San Francisco, CA, United States
    Full-time
    Join the team shaping the future of AI at Scale.Scale’s ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been power...Show moreLast updated: 17 hours ago