Senior Systems Engineer, Infrastructure & Platform Reliability

LambdaSan Francisco, CA, United States

4 hours ago

Job type

Full-time

Job description

Lambda, The Superintelligence Cloud, builds Gigawatt-scale AI Factories for Training and Inference. Lambda's mission is to make compute as ubiquitous as electricity and give every person access to artificial intelligence. One person, one GPU.

If you'd like to build the world's best deep learning cloud, join us.

Note : This position requires presence in our San Francisco or San Jose office location 4 days per week; Lambda's designated work from home day is currently Tuesday.

Information Systems at Lambda is responsible for building and scaling the internal systems that power our business. We partner across the company-Finance, GTM, Engineering, and People-to implement tools, automate workflows, and ensure data flows securely and accurately. Our scope includes enterprise applications, integrations, data platform and analytics, compliance automation, and all things IT.

What You'll Do

Design, write, and deliver software and services to improve the availability, scalability, reliability, and efficiency of Lambda's internal IT systems and platforms.

Solve problems relating to mission critical services and build automation to prevent problem recurrence with the goal of automating response to all non-exceptional events.

Work with Lambda Engineering and internal teams to Influence and create new designs, architectures, standards, and methods for large-scale distributed systems.

Engage in service capacity planning and demand forecasting, software performance analysis, and system tuning.

Be an excellent communicator, producing documentation and related artifacts for the systems you are responsible for.

You

Have a keen interest in system design, architecting for performance, scalability, and experience with multiple cloud infrastructure platforms (AWS, GCP, Azure, etc.).

Think carefully about systems : edge cases, failure modes, behaviors, and specific implementations.

Know and prefer configuration management systems and toolchains (Chef, Ansible, Terraform, GitHub Actions, etc.)

Have solid programming skills : Python, Go, etc.

Have an urge to collaborate and communicate asynchronously, combined with a desire to record and document issues and solutions.

Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it.

Have an urge for delivering quickly and effectively, and iterating fast.

Nice to Have

Experience and interest in ML / AI workloads and compute

Practical experience implementing and managing paging, alerting, and on-call scheduling flows

A positive attitude, combined with a desire to learn and collaborate

Salary Range Information

The annual salary range for this position has been set based on market data and other factors. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

Founded in 2012, ~400 employees (2025) and growing fast

We offer generous cash & equity compensation

Our investors include Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In-Q-Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, US Innovative Technology, Gradient Ventures, Mercato Partners, SVB, 1517, Crescent Cove.

We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability

Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG

Health, dental, and vision coverage for you and your dependents

Wellness and Commuter stipends for select roles

401k Plan with 2% company match (USA employees)

Flexible Paid Time Off Plan that we all actually use

A Final Note :

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

Create a job alert for this search

Senior Systems Engineer, Infrastructure & Platform Reliability • San Francisco, CA, United States

Related jobs

Promoted

Sr. IT Systems Engineer (26646)

SupermicroSan Jose, CA, United States

Full-time

Supermicrois a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers ...Show moreLast updated: 10 days ago

Promoted

Senior Systems Engineer

Center for Elders' IndependenceOakland, CA, US

Full-time

The Center for Elders’ Independence.PACE (Program of All-Inclusive Care for the elderly) organization (PO) that uses an interdisciplinary team approach for care planning and implementing purp...Show moreLast updated: 30+ days ago

Promoted

Senior Systems Reliability Engineer

Serve RoboticsRedwood City, CA, US

Full-time

At Serve Robotics, we’re reimagining how things move in cities.Our personable sidewalk robot is our vision for the future. It’s designed to take deliveries away from congested streets, m...Show moreLast updated: 30+ days ago

Promoted

Senior Site Reliability Engineer Cloud Platform

ZillizRedwood City, CA, United States

Full-time

Zilliz is a fast-growing startup developing the industry's leading vector database company for enterprise-grade AI.Founded by the engineers behind Milvus, the world's most popular open-source vecto...Show moreLast updated: 30+ days ago

Promoted

Senior Systems Engineer

Robert HalfSan Francisco, CA, US

Full-time

We are looking for an experienced Systems Engineer to oversee and enhance the performance, security, and scalability of IT infrastructures across corporate and client environments.This role encompa...Show moreLast updated: 19 days ago

Promoted

Infrastructure & Systems Engineer

VIGILENT CORPORATIONOakland, CA, US

Full-time +1

Vigilent is looking for world-class talent to help us achieve our mission of improving facility operations while creating a more sustainable planet. Vigilent applies machine learning, Al and expert ...Show moreLast updated: 30+ days ago

Promoted

Senior Site Reliability Engineer (Cloud Infra)

Mumba Technologies, Inc.Palo Alto, CA, US

Full-time

We are seeking a highly skilled.Senior Site Reliability Engineer.In this role responsibilities will include designing and implementing infrastructure automation, continuous integration and delivery...Show moreLast updated: 4 days ago

Promoted

Site Reliability Engineer - Infrastructure

VerkadaSan Mateo, CA, United States

Full-time

Designed with simplicity in mind, Verkada's six product lines - video security cameras, access control, environmental sensors, alarms, workplace, and intercoms - provide unparalleled building secur...Show moreLast updated: 7 days ago

Promoted

Principal Information Systems Engineer - Systems Specialty - City and County of San Francisco -[...]

San FranciscoSan Francisco, CA, United States

Full-time

San Francisco is a vibrant and dynamic city, on the forefront of economic growth & innovation, urban development, arts & entertainment, as well as social issues & change. This rich tapestry of cultu...Show moreLast updated: 24 days ago

Promoted

Site Reliability Engineer, Frontier Systems Infrastructure

OpenAISan Francisco, CA, United States

Full-time

The Frontier Systems team at OpenAI builds, launches, and supports the largest supercomputers in the world that OpenAI uses for its most cutting edge model training. We take data center designs, tur...Show moreLast updated: 3 days ago

Promoted

Director, Site Reliability Engineering - Infrastructure Platform

Okta for DevelopersSan Francisco, CA, United States

Permanent

Director, Site Reliability Engineering - Infrastructure Platform.Join as the Director of Infrastructure Platform and Shared Services at Okta for Developers. Oversee multiple teams focused on Edge ne...Show moreLast updated: 1 day ago

Promoted

Infrastructure, DevOps & Reliability Engineer (Multiple Roles, Remote & On-Site)

MLabsSan Francisco, CA, US

Remote

Full-time

We’re recruiting Infrastructure, DevOps, and Reliability Engineers for high-growth startups including.AirGarage, Dyno Therapeutics, Codex Health, and Banquet Health.These roles focus on scali...Show moreLast updated: 30+ days ago

Promoted

Principal Cloud Site Reliability Engineer, Actimize

NICESanta Clara, CA, United States

Full-time

At NiCE, we don't limit our challenges.We set the highest standards and execute beyond them.And if you're like us, we can offer you the ultimate career opportunity that will light a fire within you...Show moreLast updated: 1 day ago

Promoted

Software Engineer, Infrastructure Reliability

OpenAISan Francisco, CA, United States

Full-time

We're hiring software engineers to join our broader Infrastructure organization, which supports multiple high-impact teams. Depending on your interests and experience, you could work on one of sever...Show moreLast updated: 5 days ago

Promoted

Senior System Engineer

Robert HalfSan Francisco, CA, United States

Full-time

In this role, you’ll sit at the intersection of.You’ll write code, design integrations, automate workflows, and create dashboards that directly impact how a multi-billion-dollar real estate portfol...Show moreLast updated: 4 days ago

Promoted

Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage

5 Star Global Recruitment PartnersSan Jose, CA, United States

Full-time

About the job Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage.SPIFFE - Experience SPIRE - Experience Multiple Cloud Experience Kubernetes. Deep Knowledge base of Development I...Show moreLast updated: 30+ days ago

Promoted

Lead Systems Engineer - Operational Platforms

Robert HalfSan Francisco, CA, US

Full-time

Lead Engineer / Senior Lead Engineer.This role sits at the intersection of.Unlike traditional engineering roles, this position is. Asset Management, Property Operations, and Construction leaders to ...Show moreLast updated: 12 days ago

Promoted

Senior Systems Engineer (Contract)

Blue Star Partners LLCPleasanton, CA, US

Full-time

Senior Systems Engineer (Contract).W-2 (Non Exempt; must be authorized to work in the U.For our client, we are looking for a highly skilled Senior Systems Engineer to lead our infrastructure core s...Show moreLast updated: 30+ days ago