Inference Software Engineer - CollectivesETCHED LLC • San Jose, CA, United States

Inference Software Engineer - Collectives

ETCHED LLC • San Jose, CA, United States

30+ days ago

Job type

Full-time

Job description

About Etched

Etched is building the world's first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents. Backed by hundreds of millions from top-tier investors and staffed by leading engineers, Etched is redefining the infrastructure layer for the fastest growing industry in history.

Job Summary

Etched's Inference SW team enables optimal mapping of models to Sohu's dataflow architecture and serving requests across multiple chips, hosts and racks. We are seeking a highly skilled and motivated engineer to formalize and optimize our collectives (e.g. Send / Recieve, AllReduce, Broadcast, etc.). You'll build SW enabling frontier inference performance to satisfy exponentially growing serving demand.

In this role, your core focus will be working across systems and research to realize Mixture of Expert (MoE) architectures on Sohu's system. You will play a key role in scaling out Sohu's nascent runtime, with a focus on collectives.

Key responsibilities

Formalize and optimize our collectives (e.g. Send / Recieve, AllReduce, Broadcast, etc.)
Collaborate across systems and research teams to bring MoE architectures to Sohu's runtime
Optimize expert routing and communication layers using Sohu's collectives
Contribute to scaling and enhancing Sohu's runtime, including multi-node inference, intra-node execution, state management, and robust error handling
Develop tools for performance profiling and debugging, identifying bottlenecks and correctness issues

You may be a good fit if you have

Strong proficiency in Rust and / or C++; familiarity with PyTorch and / or JAX.

Experience designing / optimizing collectives (e.g. NCCL, MPI collectives, XLA collectives, etc.)

Strong systems knowledge, including Linux internals, accelerator architectures (e.g., GPUs, TPUs), high-speed interconnects (e.g., NVLink, InfiniBand) and RDMA

Solid understanding of distributed systems concepts, algorithms, and challenges, including consensus protocols, consistency models, and communication patterns

Experience analyzing performance traces and logs from distributed systems and ML workloads.

A knack for designing user-facing interfaces and libraries, and enjoy looking for that elusive optimum between performance and usability.

Strong candidates may also have experience with

Large language model architectures, particularly Mixture-of-Experts (MoE).

Familiarity with network simulation techniques

Developed low-latency, high-performance applications using both kernel-level and user-space networking stacks.

Ported applications to non-standard or accelerator hardware platforms.

Contributed to runtime systems with complex, well-documented interfaces, such as distributed storage systems or machine learning runtimes.

Built applications with extensive SIMD (Single Instruction, Multiple Data) optimizations for performance-critical paths.

Benefits

Full medical, dental, and vision packages, with generous premium coverage

Housing subsidy of $2,000 / month for those living within walking distance of the office

Daily lunch and dinner in our office

Relocation support for those moving to West San Jose

Compensation Range

$175,000 - $275,000

How we're different

Etched believes in the Bitter Lesson. We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.

We are a fully in-person team in West San Jose, and greatly value engineering skills. We do not have boundaries between engineering and research, and we expect all of our technical staff to contribute to both as needed.

Create a job alert for this search

Software Engineer • San Jose, CA, United States

Related jobs

Software Engineer - Large Scale Inference

The San Francisco Compute Company • San Francisco, CA, United States

Full-time

We think people should buy it like one.Startups shouldn’t be forced to buy a year’s worth of compute time in order to get market rate and compute providers shouldn’t go bankrupt because they can’t ...Show more

Last updated: 30+ days ago • Promoted

Software Engineer, Training & Inference Infrastructure

DatologyAI • Redwood City, CA, United States

Full-time

But a large portion of training compute is wasted training on data that are already learned, irrelevant, or even harmful, leading to worse models that cost more to train and deploy.At DatologyAI, w...Show more

Last updated: 30+ days ago • Promoted

Software Engineer - Applied Inference

Xai • Palo Alto, CA, United States

Full-time

AI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show more

Last updated: 2 days ago • Promoted

AGI Sr Inference Software Development Engineering, AGI Inference

Amazon • Sunnyvale, CA, United States

Full-time

The Sensory Inference team at AGI is a group of innovative developers working on ground-breaking multi-modal inference solutions that revolutionize how AI systems perceive and interact with the wor...Show more

Last updated: 30+ days ago • Promoted

Software Engineer, Inference

algojobs • San Francisco, CA, United States

Full-time

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...Show more

Last updated: 6 days ago • Promoted

Senior Inference Software Engineer

Etched • San Jose, CA, United States

Full-time

Etched is building AI chips that are hard-coded for individual model architectures.Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower laten...Show more

Last updated: 6 days ago • Promoted

Software Engineer, Enterprise AI

Scale AI, Inc. • San Francisco, CA, United States

Full-time

Scale GP (Scale Generative AI Platform) is an enterprise-grade Generative AI platform that provides APIs for knowledge retrieval, inference, evaluation, and more. We are looking for a strong enginee...Show more

Last updated: 30+ days ago • Promoted

Software Engineer

Supermicro • San Jose, CA, United States

Full-time

Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show more

Last updated: 30+ days ago • Promoted

Software Engineer - GenAI inference

Menlo Ventures • San Francisco, CA, United States

Full-time

As a software engineer for GenAI inference, you will help design, develop, and optimize the inference engine that powers Databricks’ Foundation Model API. You’ll work at the intersection of research...Show more

Last updated: 11 days ago • Promoted

Software Engineer, Model Inference

OpenAI • San Francisco, CA, United States

Full-time

Our Inference team brings OpenAI's most capable research and technology to the world through our products.We empower consumers, enterprise and developers alike to use and access our start-of-the-ar...Show more

Last updated: 30+ days ago • Promoted

Software Engineer, Inference

Trypulse • San Francisco, CA, United States

Full-time

Pulse is tackling one of the most persistent challenges in data infrastructure : extracting accurate, structured information from complex documents at scale. We have a breakthrough approach to docume...Show more

Last updated: 30+ days ago • Promoted

Senior Software Engineer, Inference Platform

MongoDB • Palo Alto, CA, United States

Full-time

MongoDB’s mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data. We enable organizations of all sizes to easily build, scale, and...Show more

Last updated: 30+ days ago • Promoted

Software Engineer, Optimus Inference Co Design

Tesla • Palo Alto, CA, United States

Full-time

The AI inference co-design team's goal is to take research models and make them run efficiently on our AI-ASIC to power real-time inference for Optimus humanoid robot programs, with applications ex...Show more

Last updated: 30+ days ago • Promoted

Software Engineer, Foundation Inference Infrastructure

Tesla • Palo Alto, CA, United States

Full-time

As a member of the Foundation Inference Infrastructure team, you will design & implement a diverse set of backend services and tools that power autonomy software and hardware development processes....Show more

Last updated: 30+ days ago • Promoted

Sr. Software Engineer (25403)

Supermicro • San Jose, CA, United States

Full-time

Last updated: 26 days ago • Promoted