Talent.com
Senior Software Engineer - Together Cloud PlatformSan Francisco
Senior Software Engineer - Together Cloud PlatformSan FranciscoTogether AI • San Francisco, CA, United States
No longer accepting applications
Senior Software Engineer - Together Cloud PlatformSan Francisco

Senior Software Engineer - Together Cloud PlatformSan Francisco

Together AI • San Francisco, CA, United States
7 days ago
Job type
  • Full-time
Job description
Senior Backend Engineer - Together Cloud

Together AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle, combining the fastest LLM inference engine with state-of-the-art AI cloud infrastructure.

As a Senior AI Infrastructure Engineer, you will play a key role in building the next generation AI cloud platform a highly available, global, blazing-fast cloud infrastructure that virtualizes cutting-edge ML hardware (GB200s/GB300s, BlueField DPUs) and enables state-of-the-art ML practitioners with self-serve AI cloud services, such as on-demand + managed Kubernetes and Slurm clusters. This platform serves both our internal SaaS products (inference, fine-tuning) and our external cloud customers, spanning dozens of data centers across the world.

Some of what you'll work on:

  • Design, build, and maintain performant, secure, and highly-available backend services/operators that run in our data centers and automate hardware management, such as Infiniband partitioning, in-DC parallel storage provisioning, and VM provisioning.
  • Design and build out the IaaS software layer for a new GB200 data center with thousands of GPUs.
  • Work on a global multi-exabyte high-performance object store, serving massive datasets for pretraining.
  • Build advanced observability stacks for our customers with automated node lifecycle management for fault-tolerant distributed pretraining.

To be successful, you'll need to be deeply technical and possess excellent communication, collaboration, and diplomacy skills. You have strong fundamental software development skills. In addition, you have strong systems knowledge and troubleshooting abilities.

Requirements

  • 5+ years of professional software development experience and proficiency in at least one backend programming language (Golang desired)
  • 5+ years experience writing high-performance, well-tested, production quality code
  • Demonstrated experience with building and operating high-performance and/or globally distributed micro-service architectures across one or more cloud providers (AWS, Azure, GCP)
  • Excellent communication skills able to write clear design docs and work effectively with both technical and non-technical team members
  • Deep experience with Kubernetes internals a big plus, such as implementing non-trivial Kubernetes operators, device/storage/network plugins, custom schedulers, or patches thereon or Kubernetes itself
  • Deep experience with VMs/hypervisors a big plus, such as QEMU/KVM, cloud-hypervisor, VFIO, virtio, PCIE passthrough, Kubevirt, SR-IOV
  • Deep experience with DC networking tech + solutions a big plus, such as VLAN, VXLAN, VPN, VPC, OVS/OVN
  • Experience with Cluster API or similar a big plus
  • Experience working on high-performance compute, networking, and/or storage a big plus
  • Experience virtualizing GPUs and/or Infiniband a big plus
  • Strong systems knowledge across compute, networking, and storage, including concurrency, memory management, performant I/O, and scale
  • Experience with infrastructure automation tools (Terraform, Ansible), monitoring/observability stacks (Prometheus, Grafana), and CI/CD pipelines (GitHub Actions, ArgoCD)
  • Experience building IaaS or PaaS systems at scale a plus
  • Experience with DPUs/SmartNICs a plus
  • GPU programming, NCCL, CUDA knowledge a plus

Responsibilities

  • Perform architecture and research work for decentralized AI workloads
  • Work on the core, open-source Together AI platform
  • Create services, tools, and developer documentation
  • Create testing frameworks for robustness and fault-tolerance

About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance, and other benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

Create a job alert for this search

Senior Software Engineer Together Cloud PlatformSan Francisco • San Francisco, CA, United States

Similar jobs
Senior Software Engineer, San Francisco (Hybrid)

Senior Software Engineer, San Francisco (Hybrid)

International Staff Consulting • San Francisco, CA, United States
Full-time
Location: Hybrid San Francisco or New York City | Full-Time.We are in search of a skilled Senior Software Engineer to lead the development of an innovative SaaS platform.This role involves working ...Show more
Last updated: 6 days ago • Promoted
Senior Cloud Engineer

Senior Cloud Engineer

Nuon • San Francisco, CA, United States
Full-time
Senior Software Engineer, Cloud.As a Senior Software Engineer, Cloud at Nuon, you will be responsible for building and maintaining features to manage cloud infrastructure across multiple platforms....Show more
Last updated: 7 days ago • Promoted
Senior Software Engineer, Cloud Infrastructure

Senior Software Engineer, Cloud Infrastructure

Nuro • San Francisco, CA, United States
Full-time
Senior Software Engineer, Cloud Infrastructure.Nuro is a self-driving technology company on a mission to make autonomy accessible to all.Founded in 2016, Nuro is building the world's most scalable ...Show more
Last updated: 6 days ago • Promoted
Senior Software Engineer, Partner Platform & Ecosystem APIs

Senior Software Engineer, Partner Platform & Ecosystem APIs

Icon Ventures • San Francisco, CA, United States
Full-time
A leading educational technology company in San Francisco is seeking a Senior Software Engineer to design and implement APIs and integrations that connect their platform to a global ecosystem.The r...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer

Senior Software Engineer

HighNote • San Francisco, CA, United States
Full-time
Founded in 2020 by a team of leaders from Braintree, PayPal, and Lending Club, Highnote is an embedded finance company that sets the standard in modern card platform management.Weve raised $145M+ a...Show more
Last updated: 6 days ago • Promoted
Senior Software Engineer, Platform

Senior Software Engineer, Platform

David AI • San Francisco, CA, United States
Full-time
David AI is the first audio data research company.We bring an R&D approach to datadeveloping datasets with the same rigor AI labs bring to models.Our mission is to bring AI into the real world, and...Show more
Last updated: 7 days ago • Promoted
Senior Software Engineer, Partner Platform

Senior Software Engineer, Partner Platform

Parafin Inc • San Francisco, CA, United States
Full-time
Parafin Backend Software Engineer.At Parafin, we're on a mission to grow small businesses.Small businesses are the backbone of our economy, but traditional banks often don't have their backs.We bui...Show more
Last updated: 2 days ago • Promoted
Senior Software Engineer, Platform

Senior Software Engineer, Platform

Rippling • San Francisco, CA, United States
Full-time
Senior Software Engineer, Platform.Rippling gives businesses one place to run HR, IT, and Finance.It brings together all of the workforce systems that are normally scattered across a company, like ...Show more
Last updated: 7 days ago • Promoted
Senior Software Engineer, Infrastructure, Google Cloud Platforms

Senior Software Engineer, Infrastructure, Google Cloud Platforms

San Francisco Staffing • San Francisco, CA, United States
Full-time
Senior Software Engineer, Infrastructure, Google Cloud Platforms.Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interac...Show more
Last updated: 6 days ago • Promoted
Senior+ Software Engineer - Cloud Availability Platform Engineering (Observability)

Senior+ Software Engineer - Cloud Availability Platform Engineering (Observability)

Crusoe Energy Systems LLC • San Francisco, CA, United States
Full-time
We are looking for a highly skilled engineer with deep expertise in building and operating observability platforms at scale.You will design, develop, and run Crusoe’s next-generation observability ...Show more
Last updated: 30+ days ago • Promoted
Senior Solutions Engineer, Majors, San Francisco

Senior Solutions Engineer, Majors, San Francisco

CloudFlare • San Francisco, CA, United States
Full-time
Senior Solutions Engineer, Majors, San Francisco.At Cloudflare, we are on a mission to help build a better Internet.Today the company runs one of the world’s largest networks that powers millions o...Show more
Last updated: 7 days ago • Promoted
Senior Software Engineer — Cloud Native HealthTech

Senior Software Engineer — Cloud Native HealthTech

Tendo Systems • San Francisco, CA, United States
Full-time
Join a mission-driven team dedicated to transforming healthcare through innovative software solutions.As a Senior Software Engineer, you'll play a pivotal role in developing next-generation healthc...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer, Client Engagement New York City, Seattle, San Francisco, Remote-US

Senior Software Engineer, Client Engagement New York City, Seattle, San Francisco, Remote-US

Grow Therapy • San Francisco, CA, United States
Remote
Full-time
Senior Software Engineer, Client Engagement.Grow Therapy is on a mission to serve as the trusted partner for therapists growing their practice, and patients accessing high‑quality care.Powered by t...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer, Comfy Cloud

Senior Software Engineer, Comfy Cloud

Comfy • San Francisco, CA, United States
Full-time
We are looking for an AI Cloud Infra Engineer to join our infrastructure team.This role will be responsible for ensuring the reliability of our back-end systems, working with engineers who develop ...Show more
Last updated: 1 day ago • Promoted
Senior Software Engineer San Francisco

Senior Software Engineer San Francisco

Highnote Health Inc. • San Francisco, CA, United States
Full-time
Founded in 2020 by a team of leaders from Braintree, PayPal, and Lending Club, Highnote is an all in one card issuer processor and program management platform.We give digital-first organizations th...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer (Cloud API)

Senior Software Engineer (Cloud API)

Cyberhaven • San Francisco, CA, United States
Full-time
Senior Software Engineer (Cloud API).We’re looking for a software engineer to help drive and evolve our data security product.Our technology is new and rapidly evolving: you’ll be an early contribu...Show more
Last updated: 30+ days ago • Promoted
Senior Software Engineer, SI Partnerships – Cloud Platform

Senior Software Engineer, SI Partnerships – Cloud Platform

Lambda • San Francisco, CA, United States
Full-time
A cutting-edge technology company in San Francisco is seeking a Senior Software Engineer to join their SI Partnership team.You'll build end-to-end features and collaborate across various teams to e...Show more
Last updated: 30+ days ago • Promoted
Senior Site Reliability Engineer Cloud Platform

Senior Site Reliability Engineer Cloud Platform

Zilliz • Redwood City, California, US
Full-time
Job Description Job Description Zilliz is a fast-growing startup developing the industry's leading vector database company for enterprise-grade AI.Founded by the engineers behind Milvus, the world'...Show more
Last updated: 4 days ago • Promoted