Talent.com
Site Reliability Engineer
Site Reliability EngineerAmiri Recruiting • Mountain View, CA, United States
No longer accepting applications
Site Reliability Engineer

Site Reliability Engineer

Amiri Recruiting • Mountain View, CA, United States
1 day ago
Job type
  • Full-time
Job description

Job Description

Job Description

Site Reliability Engineer

Onsite- Bay Area, CA

Skills

Relevant Skills and Experience

What You’ll Do (Day-to-Day)

Own and manage our cloud infrastructure (GCP or AWS, on-prem).

Build, maintain, and optimize Kubernetes clusters (including GPU-backed clusters).

Implement and improve CI / CD pipelines (GitHub Actions).

Write and maintain Infrastructure as Code (Terraform).

Monitor system health and performance using Grafana and other observability tools.

Ensure high availability, reliability, and uptime across platforms.

Handle infrastructure maintenance, upgrades, and scaling.

Administer and improve our platform architecture and apply general security best practices across the stack.

Note : This is an internal-facing role — no customer interaction.

Must-Have :

4+ years in SRE, DevOps, or Infrastructure Engineering

Solid experience with GCP or AWS (hybrid / on-prem a plus)

Experience with Kubernetes cluster management (GPU experience a bonus)

Hands-on with Terraform and CI / CD (GitHub)

Experience with monitoring / observability (Grafana, etc.)

Strong understanding of high availability and infrastructure reliability

Familiarity with platform / cluster architecture and administration

Security mindset and ability to apply best practice

Nice-to-Have :

Startup experience (you enjoy building, not just maintaining)

Experience with scalable GPU infrastructure for AI / ML

Create a job alert for this search

Site Reliability Engineer • Mountain View, CA, United States

Related jobs
Senior / Lead Site Reliability Engineer - Federal

Senior / Lead Site Reliability Engineer - Federal

C3.ai, Inc. • Redwood City, CA, United States
Full-time
C3 AI (NYSE : AI), is the Enterprise AI application software company.C3 AI delivers a family of fully integrated products including the C3 Agentic AI Platform, an end-to-end platform for developing,...Show more
Last updated: 1 day ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Compunnel • San Leandro, CA, United States
Full-time
We are seeking a Site Reliability Engineer (SRE) with a strong focus on observability as part of the Data Center exit program. The ideal candidate will have a passion for building and maintaining re...Show more
Last updated: 1 day ago • Promoted
Senior Forward Deployed Engineer

Senior Forward Deployed Engineer

Intercom • San Francisco, CA, United States
Full-time
Intercom is the AI Customer Service company on a mission to help businesses provide incredible customer experiences.Our AI agent Fin, the most advanced customer service AI agent on the market, lets...Show more
Last updated: 1 day ago • Promoted
System Engineer

System Engineer

Supermicro • San Jose, CA, United States
Full-time
Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show more
Last updated: 30+ days ago • Promoted
Senior AI Platform Engineer

Senior AI Platform Engineer

University of California - Riverside • Oakland, CA, United States
Full-time
The Senior AI Platform Engineer is responsible for the technical design, development, and implementation of a comprehensive and scalable Generative AI platform for UC Riverside's faculty, staff, an...Show more
Last updated: 30+ days ago • Promoted
BMS Systems Integrator

BMS Systems Integrator

University of California - Santa Cruz • Santa Cruz, CA, United States
Full-time +1
For full consideration, applicants should attach their resume and cover letter when applying for a job opening.For guidance related to the application process or if you are experiencing difficultie...Show more
Last updated: 30+ days ago • Promoted
Senior Site Reliability Engineer Cloud Platform

Senior Site Reliability Engineer Cloud Platform

Zilliz • Redwood City, CA, United States
Full-time
Zilliz is a fast-growing startup developing the industry's leading vector database company for enterprise-grade AI.Founded by the engineers behind Milvus, the world's most popular open-source vecto...Show more
Last updated: 30+ days ago • Promoted
Forward Deployed Engineer

Forward Deployed Engineer

Zania, Inc • Palo Alto, CA, United States
Full-time
Every enterprise spends millions of dollars on Governance, Risk, and Compliance (GRC).It's one of the most critical, yet universally painful, parts of running a business. For decades, this industry ...Show more
Last updated: 1 day ago • Promoted
Prefabrication Systems Engineer - R+D Innovation

Prefabrication Systems Engineer - R+D Innovation

oWOW • Hayward, CA, US
Full-time
At oWOW, we’re on a mission to transform how multifamily housing comes to life.We’re a vertically integrated real estate developer, combining architecture, R&D, real estate developm...Show more
Last updated: 10 days ago • Promoted
Forward Deployed Engineer, GenAI

Forward Deployed Engineer, GenAI

Scale AI, Inc. • San Francisco, CA, United States
Full-time
At Scale AI, our mission is to accelerate the development of AI applications.For 8 years, Scale has been the leading AI data foundry, helping fuel the most exciting advancements in AI, including ge...Show more
Last updated: 30+ days ago • Promoted
Bradley Fighting Vehicle System Maintainer

Bradley Fighting Vehicle System Maintainer

United States Army • Mt Madonna, CA, US
Permanent
As a Bradley Fighting Vehicle System Maintainer, you'll have the challenging task of performing repairs and maintenance exclusively on the range of Bradley fighting vehicles, including anti-aircraf...Show more
Last updated: 11 days ago • Promoted
Forward Deployed Engineer, AI Accelerator

Forward Deployed Engineer, AI Accelerator

NVIDIA • Santa Clara, CA, United States
Full-time
NVIDIA is seeking a Forward Deployed Engineer to join our AI Accelerator team, working directly with strategic customers to implement and optimize pioneering AI workloads! You will provide hands-o...Show more
Last updated: 1 day ago • Promoted
Senior Site Reliability Engineer (Senior SRE)

Senior Site Reliability Engineer (Senior SRE)

Ciroos • Pleasanton, CA, United States
Full-time
Senior Site Reliability Engineer (Senior SRE).Be among the first 25 applicants.Ciroos (pronounced Sai rose) is a seed?stage startup founded in February 2025 by a team of experienced executives and ...Show more
Last updated: 1 day ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Signify Technology • Atherton, CA, United States
Full-time
Senior Site Reliability Engineer.Competitive, based on experience.Join our innovative technology startup that is revolutionizing healthcare with a safety-focused AI platform.Our platform assists me...Show more
Last updated: 1 day ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Signify Technology • Atherton, CA, United States
Full-time
Competitive, based on experience.We are a technology startup advancing healthcare with a safety-focused AI platform that assists medical professionals by managing patient communications, including ...Show more
Last updated: 1 day ago • Promoted
Senior Vacuum System Engineer

Senior Vacuum System Engineer

PsiQuantum • Palo Alto, CA, United States
Full-time
Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Rockwoods Inc • Pleasanton, CA, US
Full-time
Note : Candidates must have relevant experience in Medical / Healthcare domains, this is mandatory.Senior SRE Engineer - Pleasanton, 5 days office. Primary work : 24x7 On-call support and setting up mo...Show more
Last updated: 20 days ago • Promoted
Site Reliability engineering (SRE)

Site Reliability engineering (SRE)

TechDigital Group • San Leandro, CA, United States
Permanent
Java Dev background interested in this role with strong hands-on experience in building dashboards and setting up alerts using Splunk, Grafana and GCL. Software Engineering experience, or equivalent...Show more
Last updated: 1 day ago • Promoted