Talent.com
Software Engineer - Compute
Software Engineer - ComputeLambda • San Francisco, California, United States
Software Engineer - Compute

Software Engineer - Compute

Lambda • San Francisco, California, United States
30+ days ago
Job type
  • Full-time
Job description

In 2012, Lambda started with a crew of AI engineers publishing research at top machine-learning conferences. We began as an AI company built by AI engineers. That hasn't changed. Today, we're on a mission to be the world's top AI computing platform. We equip engineers with the tools to deploy AI that is fast, secure, affordable, and built to scale. Whether they need powerhouse GPU hardware on-site or the flexibility of cloud-based solutions, we've got the horsepower to make it happen. Lambda’s AI Cloud has been adopted by the world’s leading companies and research institutions including Anyscale, Rakuten, The AI Institute, and multiple enterprises with over a trillion dollars of market capitalization. Our goal is to make computation as effortless and ubiquitous as electricity.

If you'd like to build the world's best deep learning cloud, join us.

  • Note : This position requires presence in our San Francisco office location 4 days per week; Lambda’s designated work from home day is currently Tuesday.

What You’ll Do

Join a functional sub-team of Compute at Lambda which is responsible for the development of a critical internal testing system that manages AI testloads across large GPU compute clusters.

Improve code quality, internal validation, and support for new topologies in the testing system.

Work on scalability challenges, enabling the testing system to support very large-scale clusters.

Transition communication mechanisms from SSH to node agents, exploring ZeroMQ or Redis streams.

Fix bugs and operational blockers to enable smoother handoff to non-engineering teams.

Contribute to the implementation efforts, and collaborate with the team  on high-level architecture or strategic direction.

Work closely with the HPC-Ops and other internal consumers of the testing system.

You

Have strong proficiency in Python, backed by 3-5 years of professional software development experience, ideally leaning towards 5 years.

Solid understanding of Go, with the capability to develop efficient and maintainable code.

Have hands-on experience with containers (Docker preferred).

Have familiarity with Kubernetes (K8s is a plus but not required).

Are comfortable working with Linux-based systems in a distributed environment.

Have experience working with large, complex codebases and improving their maintainability.

Have learned from past large-scale technical mistakes and grown from them.

Can take ownership of an internal tool and drive it forward.

Are eager to learn on the job, especially regarding the testing system’s architecture.

Are adaptable and enjoy working in fast-moving, high-impact environments.

Communicate well with cross-functional teams and collaborate effectively on large-scale engineering efforts.

Nice to Have

Experience with Slurm or Kubernetes-based cluster management.

Familiarity with high-performance computing (HPC).

Understanding of GPU compute environments (CUDA knowledge is not required).

Interest in validating AI workloads in customer environments for debugging or preventative maintenance.

Salary Range Information

Based on market data and other factors, the annual salary range for this position is $170,000-$230,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

Founded in 2012, ~350 employees (2024) and growing fast

We offer generous cash & equity compensation

Our investors include Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In-Q-Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, US Innovative Technology, Gradient Ventures, Mercato Partners, SVB, 1517, Crescent Cove.

We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability

Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG

Health, dental, and vision coverage for you and your dependents

Commuter / Work from home stipends for select roles

401k Plan with 2% company match (USA employees)

Flexible Paid Time Off Plan that we all actually use

A Final Note :

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

Create a job alert for this search

Software Engineer • San Francisco, California, United States

Related jobs
Software Engineer

Software Engineer

General Medicine • Hayward, CA, United States
Full-time
As a software engineer at General Medicine, you’ll help build and scale a healthcare store that makes it delightfully simple to shop for any type of care. We provide upfront cash and insurance price...Show more
Last updated: 30+ days ago • Promoted
Controls Software Engineer

Controls Software Engineer

Lawrence Berkeley National Laboratory • Berkeley, CA, United States
Full-time
Berkeley Lab's Engineering Division is seeking an innovative and creative.Beamline Controls Group at the Advanced Light Source (ALS). The ALS is on the brink of an expansive equipment upgrade that w...Show more
Last updated: 30+ days ago • Promoted
Sr. Edge Compute Software Engineer

Sr. Edge Compute Software Engineer

Cerebras • San Francisco, CA, United States
Full-time
You will contribute to the development, integration, and optimization of Loft’s.Integrating ONNX-based inference runtimes and image-processing frameworks (e. ONNX Runtime, OpenCV) into Loft’s SDK.Co...Show more
Last updated: 3 days ago • Promoted
Software Engineer - AI Agent Infrastructure (Healthcare)

Software Engineer - AI Agent Infrastructure (Healthcare)

Honey Health • Hayward, CA, United States
Full-time
Honey Health is the all-in-one AI back office for primary and specialty care.Our AI agents autonomously handle core back-office jobs, such as aggregating patient data, processing orders and prescri...Show more
Last updated: 12 days ago • Promoted
Software Engineer

Software Engineer

Supermicro • San Jose, CA, United States
Full-time
Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show more
Last updated: 30+ days ago • Promoted
Software Engineer

Software Engineer

VIVIO, a Public Benefit Corporation • Hayward, CA, United States
Full-time
VIVIO Health, a Public Benefit Corporation, is revolutionizing pharmacy benefits management through data and technology.Our foundational principle - "The Right Drug for the Right Person at the Righ...Show more
Last updated: 19 days ago • Promoted
Software Engineer, Compute - Storage

Software Engineer, Compute - Storage

OpenAI • San Francisco, CA, United States
Full-time
Storage Infrastructure provides APIs for data access, placement, and lifecycle management, while ensuring that the storage systems’ capacity, throughput, and IOPs satisfy the needs of our AI resear...Show more
Last updated: 30+ days ago • Promoted
Cloud and AI Software Engineer

Cloud and AI Software Engineer

Saxon Global • Palo Alto, CA, United States
Full-time
Bachelor's degree in Computer Science, AI / ML, or related field.Strong hands-on experience with Kubernetes and container technologies in production. Proficiency in Python, Golang, or similar programm...Show more
Last updated: 21 hours ago • Promoted • New!
Senior Firmware EngineerSoftware Engineering • Berkeley, CA • Full time • On-site

Senior Firmware EngineerSoftware Engineering • Berkeley, CA • Full time • On-site

Form Energy • Berkeley, CA, United States
Full-time
Are you ready to build America's energy future? Form Energy is an American manufacturing and energy technology company.We're revolutionizing energy storage with cost-effective, multi-day technology...Show more
Last updated: 28 days ago • Promoted
Software Engineer - Compute Market

Software Engineer - Compute Market

SF Compute • San Francisco, CA, United States
Full-time
We're going to secure the financial risk of the largest infrastructure build-out in the history of the world.When people finance clusters, the data centers that house them, and the power that power...Show more
Last updated: 14 days ago • Promoted
Software Engineer, Private Computing

Software Engineer, Private Computing

OpenAI • San Francisco, CA, United States
Full-time
The Private Computing team works across product, engineering, security, and safety to build advanced privacy products and infrastructure at OpenAI. Our mission is to provide world-class security fea...Show more
Last updated: 3 days ago • Promoted
High Performance Computing Software Engineer - Supercomputing

High Performance Computing Software Engineer - Supercomputing

Institute of Foundation Models • Sunnyvale, CA, United States
Full-time
About the Institute of Foundation Models.We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next...Show more
Last updated: 21 hours ago • Promoted • New!
Staff Software Platform EngineerSoftware Engineering • Berkeley, CA; Somerville, MA; Weirton, WV • Full time • On-site

Staff Software Platform EngineerSoftware Engineering • Berkeley, CA; Somerville, MA; Weirton, WV • Full time • On-site

Form Energy • Berkeley, CA, United States
Full-time
Are you ready to build America's energy future? Form Energy is an American manufacturing and energy technology company.We're revolutionizing energy storage with cost-effective, multi-day technology...Show more
Last updated: 30+ days ago • Promoted
Software Engineer (Cortex Cloud)

Software Engineer (Cortex Cloud)

Palo Alto Networks • Santa Clara, California, United States
Full-time
At Palo Alto Networks® everything starts and ends with our mission : Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show more
Last updated: 2 days ago • Promoted
Sr. Edge Compute Software Engineer

Sr. Edge Compute Software Engineer

Mvp VC • San Francisco, CA, United States
Full-time
You will contribute to the development, integration, and optimization of Loft’s.Integrating ONNX‑based inference runtimes and image‑processing frameworks. ONNX Runtime, OpenCV) into Loft’s SDK.Confi...Show more
Last updated: 1 day ago • Promoted
Principal Software Engineer

Principal Software Engineer

Informatica LLC • Redwood City, CA, United States
Full-time
Build Your Career at Informatica.We seek innovative thinkers who believe in the power of data to drive meaningful change. At Informatica, we welcome adventurous, work-from-anywhere minds eager to so...Show more
Last updated: 30+ days ago • Promoted
Staff Software Engineer

Staff Software Engineer

Bio-Rad Laboratories • Hercules, CA, United States
Full-time
This role is both technical and collaborative.You will work closely with cross-functional teams including systems engineers, mechanical designers, assay development scientists, and quality engineer...Show more
Last updated: 30+ days ago • Promoted
Software Engineer, Full-stack

Software Engineer, Full-stack

Newsbreak • Mountain View, California, United States
Full-time
NewsBreak is redefining the way users interact with local news and their communities.By bridging local users, local content creators, and local businesses, our mission is to foster safer, more vibr...Show more
Last updated: 30+ days ago • Promoted