Talent.com
Senior Site Reliability Engineer

Senior Site Reliability Engineer

GridwareSan Francisco, CA, US
16 days ago
Job type
  • Full-time
Job description

Job Description

Job Description

About Gridware

Gridware is a San Francisco-based technology company dedicated to protecting and enhancing the electrical grid. We pioneered a groundbreaking new class of grid management called active grid response (AGR), focused on monitoring the electrical, physical, and environmental aspects of the grid that affect reliability and safety. Gridware’s advanced Active Grid Response platform uses high-precision sensors to detect potential issues early, enabling proactive maintenance and fault mitigation. This comprehensive approach helps improve safety, reduce outages, and ensure the grid operates efficiently. The company is backed by climate-tech and Silicon Valley investors. For more information, please visit www.Gridware.io.

Role Description

We are seeking a Senior Site Reliability Engineer to design, build, and maintain the infrastructure powering our modern, cloud-native applications. In this role, you will design and implement scalable and secure platforms on AWS, leveraging Kubernetes (EKS) and ArgoCD for GitOps-driven deployments. You’ll be responsible for building and optimizing CI / CD pipelines with GitHub Actions, managing event streaming with Amazon MSK, and maintaining reliable relational databases on RDS. You will own our Infrastructure as Code strategy with Terraform and drive best practices around security, identity management (IdP integrations), and cost optimization.

You will also play a key role in observability and platform reliability, building and maintaining monitoring and logging solutions with tools like Grafana, Loki, and Prometheus to ensure system performance and resilience. The successful candidate will work closely with our Cloud Security Engineer to enforce security standards, implement best practices, and ensure compliance across the infrastructure stack. This is a highly collaborative position where you’ll partner with engineering teams to deliver reliable environments, automate deployments, and improve developer velocity while staying ahead of modern DevOps and cloud-native practices.

What You’ll Do

  • Design, build, and maintain scalable, secure, and highly available infrastructure on AWS (EKS, EC2, RDS,MSK, S3, VPC …).
  • Manage and optimize Kubernetesclusters (EKS) and deploy applications using ArgoCD with GitOps best practices.
  • Implement and maintain CI / CD pipelines usingGitHub Actions (GHA), ensuring fast, reliable, and automated software delivery.
  • Build and support Kafka-based event streaming platforms using Amazon MSK for high-throughput, low-latency data pipelines.
  • Manage identity and access across platforms with IdP integration (Okta, Auth0, or similar).
  • Define and manage Infrastructure as Code with Terraform
  • Monitor, troubleshoot, and optimize system performance, cost, and reliability using observability tools like Grafana and Loki.

What We’re Looking For

  • 5+ years in DevOps / SRE / Platform Engineering, with production experience in AWS infrastructure management.
  • Deep knowledge of Kubernetes administration and GitOps tools like ArgoCD.
  • Proficiency with Infrastructure as Code with Terraform
  • Hands-on experience with CI / CD automation and pipelines (preferably GitHub Actions).
  • Expertise in running and maintaining distributed systems such as Kafka on MSK and relational databases (RDS).
  • Strong understanding of networking, security best practices, and IdP-driven access control.
  • Experience with monitoring and logging solutions (Grafana,Loki, Prometheus, or similar).
  • Ability to debug complex production issues across infrastructure, deployment, and networking layers.
  • Bonus Points

  • Familiarity with Databricks o rML Ops pipelines for data and model deployment.
  • Experience with Terragrunt
  • Knowledge of multi-cloud or hybrid cloud environments and container security tools.
  • This describes the ideal candidate; many of us have picked up this expertise along the way. Even if you meet only part of this list, we encourage you to apply!

    Benefits

    Health, Dental & Vision (Gold and Platinum with some providers plans fully covered)

    Paid parental leave

    Alternating day off (every other Monday)

    “Off the Grid”, a two week per year paid break for all employees.

    Commuter allowance

    Company-paid training

    Create a job alert for this search

    Senior Site Reliability Engineer • San Francisco, CA, US

    Related jobs
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    NVIDIASanta Clara, CA, United States
    Full-time
    NVIDIA is looking for a Senior Site Reliability Engineer to work in IPP (Infrastructure, Planning and Process).IPP is a global organization within NVIDIA. This group works with various other groups ...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    ProsperSan Francisco, CA, United States
    Full-time
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...Show moreLast updated: 9 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    LTD GlobalBerkeley, CA, US
    Full-time
    We are seeking a Site Reliability Engineer to join our Operations Group.This role plays a key part in advancing scientific discovery by supporting high-performance computing (HPC) and data analysis...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantumPalo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Insight GlobalSanta Clara, CA, United States
    Full-time
    Insight Global is looking for a seasoned SRE to join one of our largest technology clients' multifaceted and fast-paced Infrastructure, Planning and Processes organization where you will be working...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood Materials, Inc.San Francisco, CA, United States
    Full-time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling—keeping critical minerals in circulation and driving the energy transition.Founded in 2...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Runloop AISan Francisco, CA, United States
    Full-time
    Runloop is building the foundational infrastructure for the next generation of AI development.We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxe...Show moreLast updated: 13 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Tarana WirelessMilpitas, CA, United States
    Full-time
    Join the Team That's Redefining Wireless Technology.At Tarana, we're more than just a fast-growing tech companywere a team of bold innovators on a mission to revolutionize broadband.Our groundbreak...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Foxconn Industrial Internet - FIISan Jose, CA, US
    Full-time +1
    Foxconn Industrial Internet (Fii), is a world leading professional design and manufacturing service provider of communication network equipment, cloud service equipment, precision tools and industr...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PSI QuantumPalo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    LanceDBSan Francisco, CA, United States
    Full-time
    LanceDB is a developer-friendly, open-source data lake for multimodal AI.From hyper-scalable vector search to advanced retrieval for RAG, from streaming training data to interactive exploration of ...Show moreLast updated: 5 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Signify TechnologyAtherton, CA, United States
    Full-time
    Senior Site Reliability Engineer.Competitive, based on experience.Join our innovative technology startup that is revolutionizing healthcare with a safety-focused AI platform.Our platform assists me...Show moreLast updated: 4 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Eliassen GroupConcord, CA, US
    Full-time
    We are seeking a Senior Site Reliability Engineer (SRE) to join our Digital Platform Engineering team and play a critical role in ensuring the reliability, scalability, and performance of our infra...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Gridware Technologies Inc.San Francisco, CA, United States
    Full-time
    Gridware is a San Francisco-based technology company dedicated to protecting and enhancing the electrical grid.We pioneered a groundbreaking new class of grid management called active grid response...Show moreLast updated: 25 days ago
    • Promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    Prosper.comSan Francisco, CA, United States
    Full-time
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer - Supercomputing

    Site Reliability Engineer - Supercomputing

    XaiPalo Alto, CA, United States
    Full-time
    Site Reliability Engineer - Supercomputing.We are seeking a talented Site Reliability Engineer (SRE) to join our SuperComputing team. In this role, you'll ensure the reliability, scalability, and pe...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Signify TechnologyAtherton, CA, United States
    Full-time
    Competitive, based on experience.We are a technology startup advancing healthcare with a safety-focused AI platform that assists medical professionals by managing patient communications, including ...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Rockwoods IncPleasanton, CA, US
    Full-time
    Note : Candidates must have relevant experience in Medical / Healthcare domains, this is mandatory.Senior SRE Engineer - Pleasanton, 5 days office. Primary work : 24x7 On-call support and setting up mo...Show moreLast updated: 22 days ago