Talent.com
Site Reliability Engineer
Site Reliability EngineerPsiQuantum • Palo Alto, CA, United States
No se aceptan más aplicaciones
Site Reliability Engineer

Site Reliability Engineer

PsiQuantum • Palo Alto, CA, United States
Hace 25 días
Tipo de contrato
  • A tiempo completo
Descripción del trabajo

Join to apply for the Site Reliability Engineer role at PsiQuantum

Join to apply for the Site Reliability Engineer role at PsiQuantum

Get AI-powered advice on this job and more exclusive features.

Quantum computing holds the promise of humanity’s mastery over the natural world, but only if we can build a real quantum computer. PsiQuantum is on a mission to build the first real, useful quantum computers, capable of delivering the world-changing applications that the technology has long promised. We know that means we will need to build a system with roughly 1 million qubits that supports fault tolerant error correction within a scalable architecture, and a data center footprint.

By harnessing the laws of quantum physics, quantum computers can provide exponential performance increases over today’s most powerful supercomputers, offering the potential for extraordinary advances across a broad range of industries including climate, energy, healthcare, pharmaceuticals, finance, agriculture, transportation, materials design, and many more.

PsiQuantum has determined the fastest path to delivering a useful quantum computer, years earlier than the rest of the industry. Our architecture is based on silicon photonics which gives us the ability to produce our components at Tier-1 semiconductor fabs such as GlobalFoundries where we leverage high-volume semiconductor manufacturing processes, the same processes that are already producing billions of chips for telecom and consumer electronics applications. We also benefit from the quantum mechanics reality that photons don’t feel heat or electromagnetic interference, allowing us to take advantage of existing cryogenic cooling systems and industry standard fiber connectivity.

In 2024, PsiQuantum announced two government-funded projects to support the build-out of our first Quantum Data Centers and utility-scale quantum computers in Brisbane, Australia and Chicago, Illinois. Both projects are backed by nations that understand quantum computing’s potential impact and the need to scale this technology to unlock that potential. And we won’t just be building the hardware, but also the fault tolerant quantum applications that will provide industry-transforming results.

Quantum computing is not just an evolution of the decades-old advancement in compute power. It provides the key to mastering our future, not merely discovering it. The potential is enormous, and we have the plan to make it real. Come join us.

There’s much more work to be done and we are looking for exceptional talent to join us on this extraordinary journey!

Job Summary

Join the OS / Platform team as a Site Reliability Engineer (SRE) and keep our services healthy, observable, and fast. Partnering with the Platform Engineering group, you’ll own the day‑to‑day operation of our monitoring stack—Grafana, Prometheus, Loki, and Tempo—crafting dashboards that surface golden signals and drive real‑time insight. You’ll codify reliability through SLIs / SLOs, automate runbooks in Python, and lead incident response to maintain world‑class uptime across both on‑prem and AWS environments.

Responsibilities

  • Define, implement, and iterate on Service Level Indicators & Service Level Objectives (SLIs / SLOs) and error budgets for critical services.
  • Build and maintain Grafana dashboards that visualize golden signals (latency, traffic, errors, saturation) for engineers and stakeholders.
  • Operate and tune our observability pipeline (Prometheus, Loki, Tempo) to ensure scalable, low‑latency telemetry ingestion and alerting.
  • Drive incident response : triage, mitigate, perform post‑incident reviews, and implement preventive actions.
  • Develop automation and self‑service tooling in Python / Bash to streamline alerts, runbooks, and operational tasks.
  • Collaborate with Platform and Product teams on capacity planning, performance testing, and change management.
  • Improve CI / CD health checks and release safety nets within GitLab.
  • Contribute to infrastructure as code (Terraform, Ansible) for monitoring stack deployments and upgrades.

Experience / Qualifications

  • Bachelor’s Degree or higher in Computer Science, Engineering or other related technical field.
  • 5+ years in an SRE, DevOps, or Production Engineering role supporting distributed systems in production.
  • Hands‑on expertise with observability tools : Grafana, Prometheus, Loki, Tempo (or equivalent).
  • Proven track record designing dashboards and alerts around golden signals and (Utilization, Saturation, Errors) USE and RED (Rate, Errors, Duration) methodologies.
  • Solid scripting / automation skills in Python and Bash; familiarity with GitLab CI pipelines.
  • Operational experience with Kubernetes and containerized workloads.
  • Working knowledge of AWS services, networking fundamentals, and load balancing.
  • Experience running incident response and writing actionable post‑mortems.
  • Familiarity with Infrastructure as Code (Terraform, Ansible) and configuration management.
  • Exposure to regulated environments and multi‑region architectures is a plus.
  • Strong communication and collaboration skills; comfortable acting as a generalist across infrastructure, application, and data layers.
  • PsiQuantum provides equal employment opportunity for all applicants and employees. PsiQuantum does not unlawfully discriminate on the basis of race, color, religion, sex (including pregnancy, childbirth, or related medical conditions), gender identity, gender expression, national origin, ancestry, citizenship, age, physical or mental disability, military or veteran status, marital status, domestic partner status, sexual orientation, genetic information, or any other basis protected by applicable laws.

    Note : PsiQuantum will only reach out to you using an official PsiQuantum email address and will never ask you for bank account information as part of the interview process. Please report any suspicious activity to recruiting@psiquantum.com .

    We are not accepting unsolicited resumes from employment agencies.

    The ranges below reflect the target ranges for a new hire base salary. One is for the Bay Area (within 50 miles of HQ, Palo Alto), the second one (if applicable) is for elsewhere in the US (beyond 50 miles of HQ, Palo Alto). If there is only one range, it is for the specific location of where the position will be located. Actual compensation may vary outside of these ranges and is dependent on various factors including but not limited to a candidate's qualifications including relevant education and training, competencies, experience, geographic location, and business needs. Base pay is only one part of the total compensation package. Full time roles are eligible for equity and benefits. Base pay is subject to change and may be modified in the future.

    U.S. Base Pay Range

    $120,000—$140,000 USD

    Bay Area Pay Range

    $145,000—$165,000 USD

    Seniority level

    Seniority level

    Mid-Senior level

    Employment type

    Employment type

    Full-time

    Job function

    Job function

    Engineering and Information Technology

    Industries

    Computer Hardware Manufacturing

    Referrals increase your chances of interviewing at PsiQuantum by 2x

    Get notified about new Site Reliability Engineer jobs in Palo Alto, CA .

    San Francisco Bay Area $164,000.00-$204,000.00 2 weeks ago

    Mountain View, CA $52.00-$60.00 1 week ago

    Palo Alto, CA $160,000.00-$180,000.00 2 weeks ago

    Palo Alto, CA $100,000.00-$200,000.00 2 weeks ago

    Software Engineer - Mapping & Localization

    San Jose, CA $130,000.00-$182,000.00 9 months ago

    Fremont, CA $117,000.00-$173,000.00 1 week ago

    Senior Site Reliability Engineer, ML Platforms

    Santa Clara, CA $224,000.00-$425,500.00 3 days ago

    Fremont, CA $147,000.00-$208,000.00 1 week ago

    Santa Clara, CA $103,000.00-$165,600.00 5 days ago

    Mountain View, CA $138,225.00-$207,575.00 1 week ago

    Software Engineer, AI Platform - New Grad

    Mountain View, CA $145,000.00-$170,000.00 1 week ago

    DevOps Engineer EAST COAST RESIDENT (No international / OPT / CPT consideration for this role)

    Belmont, CA $110,000.00-$145,000.00 6 hours ago

    Site Reliability Engineer, Global E-Commerce

    San Jose, CA $136,800.00-$259,200.00 1 week ago

    AI / ML Software Engineer Intern (Data Platform) - 2025 Fall (BS / MS)

    Software Engineer- Python / Django / Linux : 5+yrs

    San Jose, CA $146,600.00-$203,100.00 3 weeks ago

    Software Engineer Intern (Big Data - Data Platform) - 2025 Summer / Fall (MS)

    Mountain View, CA $145,000.00-$170,000.00 1 week ago

    San Jose, CA $110,000.00-$230,000.00 1 week ago

    San Mateo, CA $150,000.00-$185,000.00 1 week ago

    San Mateo, CA $150,000.00-$185,000.00 1 week ago

    Foster City, CA $160,000.00-$250,000.00 4 months ago

    Santa Clara, CA $175,000.00-$195,000.00 1 month ago

    New Grads 2025 - General Software Engineer

    San Jose, CA $120,000.00-$165,000.00 5 months ago

    San Mateo, CA $150,000.00-$185,000.00 1 week ago

    San Mateo, CA $150,000.00-$185,000.00 1 week ago

    We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

    #J-18808-Ljbffr

    Crear una alerta de empleo para esta búsqueda

    Site Reliability Engineer • Palo Alto, CA, United States

    Ofertas relacionadas
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    VirtualVocations • Santa Clara, California, United States
    A tiempo completo
    A company is looking for a Senior Site Reliability Engineer.Key Responsibilities Maintain scalable, secure, and reliable cloud services to ensure system operations within Service Level Objectives...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Site Reliability Engineering Manager

    Site Reliability Engineering Manager

    VirtualVocations • Fremont, California, United States
    A tiempo completo
    A company is looking for a Site Reliability Engineering Manager to lead their Site Reliability Engineering team.Key Responsibilities Lead and mentor a team of SREs, promoting growth and collabora...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Site Reliability Engineer

    Site Reliability Engineer

    Bits to Atoms • San Francisco, CA, United States
    A tiempo completo
    Site Reliability Engineer (SRE).You’ll work at the intersection of infrastructure, AI / ML systems, and mission-critical physical operations. You’ll collaborate directly with engineering, AI, and oper...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantum • Palo Alto, CA, United States
    A tiempo completo
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Staff Systems Reliability Engineer

    Staff Systems Reliability Engineer

    VirtualVocations • Fremont, California, United States
    A tiempo completo
    A company is looking for a Staff Systems Reliability Engineer.Key Responsibilities Design and implement scalable, fault-tolerant AWS-based infrastructure Develop and maintain CI / CD pipelines and...Mostrar más
    Última actualización: hace 4 días • Oferta promocionada
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood Materials, Inc. • San Francisco, CA, United States
    A tiempo completo
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling—keeping critical minerals in circulation and driving the energy transition.Founded in 2...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Site Reliability Engineer

    Site Reliability Engineer

    Runloop AI • San Francisco, CA, United States
    A tiempo completo
    Runloop is building the foundational infrastructure for the next generation of AI development.We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxe...Mostrar más
    Última actualización: hace 7 días • Oferta promocionada
    Site Reliability Engineer

    Site Reliability Engineer

    WorkOS • San Francisco, CA, United States
    A tiempo completo
    About WorkOS 🚀 WorkOS builds tools and services for developers to help them implement authentication, identity, authorization, and overall enterprise readiness. We’re a fully distributed team with ...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Site Reliability Engineer

    Site Reliability Engineer

    Alchemy • San Francisco, CA, United States
    A tiempo completo
    Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers th...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Site Reliability Engineer

    Site Reliability Engineer

    Together AI • San Francisco, CA, United States
    A tiempo completo
    As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a soft...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Site Reliability Engineer

    Site Reliability Engineer

    Foxconn Industrial Internet - FII • San Jose, CA, US
    A tiempo completo +1
    Quick Apply
    Site Reliability Engineer Foxconn Industrial Internet (Fii), is a world leading professional design and manufacturing service provider of communication network equipment, cloud service equipment, p...Mostrar más
    Última actualización: hace más de 30 días
    Site Reliability Engineer

    Site Reliability Engineer

    VirtualVocations • Oakland, California, United States
    A tiempo completo
    A company is looking for a Site Reliability Engineer to join a Cloud Services team in a remote role.Key Responsibilities Serve as a cloud SME for clients, providing expertise in design, architect...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Baseten • San Francisco, CA, United States
    A tiempo completo
    Baseten powers inference for the world's most dynamic AI companies, like OpenEvidence, Clay, Mirage, Gamma, Sourcegraph, Writer, Abridge, Bland, and Zed. By uniting applied AI research, flexible inf...Mostrar más
    Última actualización: hace 3 días • Oferta promocionada
    Site Reliability Engineer

    Site Reliability Engineer

    Fractal • San Francisco, CA, United States
    A tiempo completo
    This range is provided by Fractal.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Fractal Analytics is a strategic AI partner to Fortune 500 com...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Site Reliability Engineer

    Site Reliability Engineer

    Signify Technology • Palo Alto, CA, US
    A tiempo completo
    Competitive, based on experience.We are a technology startup advancing healthcare with a safety-focused AI platform that assists medical professionals by managing patient communications, including ...Mostrar más
    Última actualización: hace 17 días • Oferta promocionada
    Site Reliability Engineer

    Site Reliability Engineer

    Primer • San Francisco, CA, United States
    A tiempo completo
    Primer helps B2B products break out of the B2C-centric marketing box.Our platform turns consumer ad channels, data streams, and emerging AI workflows into measurable growth engines for go-to-market...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Site Reliability Engineer

    Site Reliability Engineer

    P2P • San Francisco, CA, United States
    A tiempo completo
    Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers th...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Site Reliability Engineer II

    Site Reliability Engineer II

    Hinge Health • San Francisco, CA, United States
    A tiempo completo
    From scaling Kubernetes clusters to improving observability with Datadog, we build the tooling and automation that empower product teams to ship with confidence. Collaborate with engineering teams t...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada