Site Reliability EngineerAmiri Recruiting • Mountain View, CA, United States

No longer accepting applications

Site Reliability Engineer

Amiri Recruiting • Mountain View, CA, United States

1 day ago

Job type

Full-time

Job description

Job Description

Site Reliability Engineer

Onsite- Bay Area, CA

Skills

Relevant Skills and Experience

What You’ll Do (Day-to-Day)

Own and manage our cloud infrastructure (GCP or AWS, on-prem).

Build, maintain, and optimize Kubernetes clusters (including GPU-backed clusters).

Implement and improve CI / CD pipelines (GitHub Actions).

Write and maintain Infrastructure as Code (Terraform).

Monitor system health and performance using Grafana and other observability tools.

Ensure high availability, reliability, and uptime across platforms.

Handle infrastructure maintenance, upgrades, and scaling.

Administer and improve our platform architecture and apply general security best practices across the stack.

Note : This is an internal-facing role — no customer interaction.

Must-Have :

4+ years in SRE, DevOps, or Infrastructure Engineering

Solid experience with GCP or AWS (hybrid / on-prem a plus)

Experience with Kubernetes cluster management (GPU experience a bonus)

Hands-on with Terraform and CI / CD (GitHub)

Experience with monitoring / observability (Grafana, etc.)

Strong understanding of high availability and infrastructure reliability

Familiarity with platform / cluster architecture and administration

Security mindset and ability to apply best practice

Nice-to-Have :

Startup experience (you enjoy building, not just maintaining)

Experience with scalable GPU infrastructure for AI / ML

Create a job alert for this search

Site Reliability Engineer • Mountain View, CA, United States

Related jobs

Senior / Lead Site Reliability Engineer - Federal

C3.ai, Inc. • Redwood City, CA, United States

Full-time

C3 AI (NYSE : AI), is the Enterprise AI application software company.C3 AI delivers a family of fully integrated products including the C3 Agentic AI Platform, an end-to-end platform for developing,...Show more

Last updated: 1 day ago • Promoted

Site Reliability Engineer

Compunnel • San Leandro, CA, United States

Full-time

We are seeking a Site Reliability Engineer (SRE) with a strong focus on observability as part of the Data Center exit program. The ideal candidate will have a passion for building and maintaining re...Show more

Last updated: 1 day ago • Promoted

Senior Forward Deployed Engineer

Intercom • San Francisco, CA, United States

Full-time

Intercom is the AI Customer Service company on a mission to help businesses provide incredible customer experiences.Our AI agent Fin, the most advanced customer service AI agent on the market, lets...Show more

Last updated: 1 day ago • Promoted

System Engineer

Supermicro • San Jose, CA, United States

Full-time

Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show more

Last updated: 30+ days ago • Promoted

Senior AI Platform Engineer

University of California - Riverside • Oakland, CA, United States

Full-time

The Senior AI Platform Engineer is responsible for the technical design, development, and implementation of a comprehensive and scalable Generative AI platform for UC Riverside's faculty, staff, an...Show more

Last updated: 30+ days ago • Promoted

BMS Systems Integrator

University of California - Santa Cruz • Santa Cruz, CA, United States

Full-time +1

For full consideration, applicants should attach their resume and cover letter when applying for a job opening.For guidance related to the application process or if you are experiencing difficultie...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer Cloud Platform

Zilliz • Redwood City, CA, United States

Full-time

Zilliz is a fast-growing startup developing the industry's leading vector database company for enterprise-grade AI.Founded by the engineers behind Milvus, the world's most popular open-source vecto...Show more

Last updated: 30+ days ago • Promoted

Forward Deployed Engineer

Zania, Inc • Palo Alto, CA, United States

Full-time

Every enterprise spends millions of dollars on Governance, Risk, and Compliance (GRC).It's one of the most critical, yet universally painful, parts of running a business. For decades, this industry ...Show more

Last updated: 1 day ago • Promoted

Prefabrication Systems Engineer - R+D Innovation

oWOW • Hayward, CA, US

Full-time

At oWOW, we’re on a mission to transform how multifamily housing comes to life.We’re a vertically integrated real estate developer, combining architecture, R&D, real estate developm...Show more

Last updated: 10 days ago • Promoted

Forward Deployed Engineer, GenAI

Scale AI, Inc. • San Francisco, CA, United States

Full-time

At Scale AI, our mission is to accelerate the development of AI applications.For 8 years, Scale has been the leading AI data foundry, helping fuel the most exciting advancements in AI, including ge...Show more

Last updated: 30+ days ago • Promoted

Bradley Fighting Vehicle System Maintainer

United States Army • Mt Madonna, CA, US

Permanent

As a Bradley Fighting Vehicle System Maintainer, you'll have the challenging task of performing repairs and maintenance exclusively on the range of Bradley fighting vehicles, including anti-aircraf...Show more

Last updated: 11 days ago • Promoted

Forward Deployed Engineer, AI Accelerator

NVIDIA • Santa Clara, CA, United States

Full-time

NVIDIA is seeking a Forward Deployed Engineer to join our AI Accelerator team, working directly with strategic customers to implement and optimize pioneering AI workloads! You will provide hands-o...Show more

Last updated: 1 day ago • Promoted

Senior Site Reliability Engineer (Senior SRE)

Ciroos • Pleasanton, CA, United States

Full-time

Senior Site Reliability Engineer (Senior SRE).Be among the first 25 applicants.Ciroos (pronounced Sai rose) is a seed?stage startup founded in February 2025 by a team of experienced executives and ...Show more

Last updated: 1 day ago • Promoted

Senior Site Reliability Engineer

Signify Technology • Atherton, CA, United States

Full-time

Senior Site Reliability Engineer.Competitive, based on experience.Join our innovative technology startup that is revolutionizing healthcare with a safety-focused AI platform.Our platform assists me...Show more

Last updated: 1 day ago • Promoted

Site Reliability Engineer

Signify Technology • Atherton, CA, United States

Full-time

Competitive, based on experience.We are a technology startup advancing healthcare with a safety-focused AI platform that assists medical professionals by managing patient communications, including ...Show more

Last updated: 1 day ago • Promoted

Senior Vacuum System Engineer

PsiQuantum • Palo Alto, CA, United States

Full-time

Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Rockwoods Inc • Pleasanton, CA, US

Full-time

Note : Candidates must have relevant experience in Medical / Healthcare domains, this is mandatory.Senior SRE Engineer - Pleasanton, 5 days office. Primary work : 24x7 On-call support and setting up mo...Show more

Last updated: 20 days ago • Promoted

Site Reliability engineering (SRE)

TechDigital Group • San Leandro, CA, United States

Permanent

Java Dev background interested in this role with strong hands-on experience in building dashboards and setting up alerts using Splunk, Grafana and GCL. Software Engineering experience, or equivalent...Show more

Last updated: 1 day ago • Promoted