Talent.com
Inference Software Engineer - Collectives
Inference Software Engineer - CollectivesETCHED LLC • San Jose, CA, United States
Inference Software Engineer - Collectives

Inference Software Engineer - Collectives

ETCHED LLC • San Jose, CA, United States
30+ days ago
Job type
  • Full-time
Job description

About Etched

Etched is building the world's first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents. Backed by hundreds of millions from top-tier investors and staffed by leading engineers, Etched is redefining the infrastructure layer for the fastest growing industry in history.

Job Summary

Etched's Inference SW team enables optimal mapping of models to Sohu's dataflow architecture and serving requests across multiple chips, hosts and racks. We are seeking a highly skilled and motivated engineer to formalize and optimize our collectives (e.g. Send / Recieve, AllReduce, Broadcast, etc.). You'll build SW enabling frontier inference performance to satisfy exponentially growing serving demand.

In this role, your core focus will be working across systems and research to realize Mixture of Expert (MoE) architectures on Sohu's system. You will play a key role in scaling out Sohu's nascent runtime, with a focus on collectives.

Key responsibilities

  • Formalize and optimize our collectives (e.g. Send / Recieve, AllReduce, Broadcast, etc.)
  • Collaborate across systems and research teams to bring MoE architectures to Sohu's runtime
  • Optimize expert routing and communication layers using Sohu's collectives
  • Contribute to scaling and enhancing Sohu's runtime, including multi-node inference, intra-node execution, state management, and robust error handling
  • Develop tools for performance profiling and debugging, identifying bottlenecks and correctness issues

You may be a good fit if you have

  • Strong proficiency in Rust and / or C++; familiarity with PyTorch and / or JAX.
  • Experience designing / optimizing collectives (e.g. NCCL, MPI collectives, XLA collectives, etc.)
  • Strong systems knowledge, including Linux internals, accelerator architectures (e.g., GPUs, TPUs), high-speed interconnects (e.g., NVLink, InfiniBand) and RDMA
  • Solid understanding of distributed systems concepts, algorithms, and challenges, including consensus protocols, consistency models, and communication patterns
  • Experience analyzing performance traces and logs from distributed systems and ML workloads.
  • A knack for designing user-facing interfaces and libraries, and enjoy looking for that elusive optimum between performance and usability.
  • Strong candidates may also have experience with

  • Large language model architectures, particularly Mixture-of-Experts (MoE).
  • Familiarity with network simulation techniques
  • Developed low-latency, high-performance applications using both kernel-level and user-space networking stacks.
  • Ported applications to non-standard or accelerator hardware platforms.
  • Contributed to runtime systems with complex, well-documented interfaces, such as distributed storage systems or machine learning runtimes.
  • Built applications with extensive SIMD (Single Instruction, Multiple Data) optimizations for performance-critical paths.
  • Benefits

  • Full medical, dental, and vision packages, with generous premium coverage
  • Housing subsidy of $2,000 / month for those living within walking distance of the office
  • Daily lunch and dinner in our office
  • Relocation support for those moving to West San Jose
  • Compensation Range

  • $175,000 - $275,000
  • How we're different

    Etched believes in the Bitter Lesson. We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.

    We are a fully in-person team in West San Jose, and greatly value engineering skills. We do not have boundaries between engineering and research, and we expect all of our technical staff to contribute to both as needed.

    Create a job alert for this search

    Software Engineer • San Jose, CA, United States

    Related jobs
    Software Engineer - Large Scale Inference

    Software Engineer - Large Scale Inference

    The San Francisco Compute Company • San Francisco, CA, United States
    Full-time
    We think people should buy it like one.Startups shouldn’t be forced to buy a year’s worth of compute time in order to get market rate and compute providers shouldn’t go bankrupt because they can’t ...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer, Training & Inference Infrastructure

    Software Engineer, Training & Inference Infrastructure

    DatologyAI • Redwood City, CA, United States
    Full-time
    But a large portion of training compute is wasted training on data that are already learned, irrelevant, or even harmful, leading to worse models that cost more to train and deploy.At DatologyAI, w...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer - Applied Inference

    Software Engineer - Applied Inference

    Xai • Palo Alto, CA, United States
    Full-time
    AI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show more
    Last updated: 2 days ago • Promoted
    AGI Sr Inference Software Development Engineering, AGI Inference

    AGI Sr Inference Software Development Engineering, AGI Inference

    Amazon • Sunnyvale, CA, United States
    Full-time
    The Sensory Inference team at AGI is a group of innovative developers working on ground-breaking multi-modal inference solutions that revolutionize how AI systems perceive and interact with the wor...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer, Inference

    Software Engineer, Inference

    algojobs • San Francisco, CA, United States
    Full-time
    Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...Show more
    Last updated: 6 days ago • Promoted
    Senior Inference Software Engineer

    Senior Inference Software Engineer

    Etched • San Jose, CA, United States
    Full-time
    Etched is building AI chips that are hard-coded for individual model architectures.Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower laten...Show more
    Last updated: 6 days ago • Promoted
    Software Engineer, Enterprise AI

    Software Engineer, Enterprise AI

    Scale AI, Inc. • San Francisco, CA, United States
    Full-time
    Scale GP (Scale Generative AI Platform) is an enterprise-grade Generative AI platform that provides APIs for knowledge retrieval, inference, evaluation, and more. We are looking for a strong enginee...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer

    Software Engineer

    Supermicro • San Jose, CA, United States
    Full-time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer - GenAI inference

    Software Engineer - GenAI inference

    Menlo Ventures • San Francisco, CA, United States
    Full-time
    As a software engineer for GenAI inference, you will help design, develop, and optimize the inference engine that powers Databricks’ Foundation Model API. You’ll work at the intersection of research...Show more
    Last updated: 11 days ago • Promoted
    Software Engineer, Model Inference

    Software Engineer, Model Inference

    OpenAI • San Francisco, CA, United States
    Full-time
    Our Inference team brings OpenAI's most capable research and technology to the world through our products.We empower consumers, enterprise and developers alike to use and access our start-of-the-ar...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer, Inference

    Software Engineer, Inference

    Trypulse • San Francisco, CA, United States
    Full-time
    Pulse is tackling one of the most persistent challenges in data infrastructure : extracting accurate, structured information from complex documents at scale. We have a breakthrough approach to docume...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software Engineer, Inference Platform

    Senior Software Engineer, Inference Platform

    MongoDB • Palo Alto, CA, United States
    Full-time
    MongoDB’s mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data. We enable organizations of all sizes to easily build, scale, and...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer, Optimus Inference Co Design

    Software Engineer, Optimus Inference Co Design

    Tesla • Palo Alto, CA, United States
    Full-time
    The AI inference co-design team's goal is to take research models and make them run efficiently on our AI-ASIC to power real-time inference for Optimus humanoid robot programs, with applications ex...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer, Foundation Inference Infrastructure

    Software Engineer, Foundation Inference Infrastructure

    Tesla • Palo Alto, CA, United States
    Full-time
    As a member of the Foundation Inference Infrastructure team, you will design & implement a diverse set of backend services and tools that power autonomy software and hardware development processes....Show more
    Last updated: 30+ days ago • Promoted
    Sr. Software Engineer (25403)

    Sr. Software Engineer (25403)

    Supermicro • San Jose, CA, United States
    Full-time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show more
    Last updated: 26 days ago • Promoted
    Inference Software Engineer

    Inference Software Engineer

    ETCHED LLC • Cupertino, CA, United States
    Full-time
    Etched is building AI chips that are hard-coded for individual model architectures.Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower laten...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer - Large Scale Inference

    Software Engineer - Large Scale Inference

    SF Compute • San Francisco, CA, United States
    Full-time
    We're going to secure the financial risk of the largest infrastructure build-out in the history of the world.When people finance clusters, the data centers that house them, and the power that power...Show more
    Last updated: 15 days ago • Promoted
    Software Engineer, AI Inference Co Design

    Software Engineer, AI Inference Co Design

    Tesla • Palo Alto, CA, United States
    Full-time
    The AI inference co-design team's goal is to take research models and make them run efficiently on our AI-ASIC to power real-time inference for Autopilot and Optimus programs.This unique role lies ...Show more
    Last updated: 30+ days ago • Promoted