Talent.com
Site Reliability Engineer
Site Reliability EngineerPsiQuantum • Palo Alto, CA, United States
Site Reliability Engineer

Site Reliability Engineer

PsiQuantum • Palo Alto, CA, United States
30+ days ago
Job type
  • Full-time
Job description

Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a real quantum computer. PsiQuantum is on a mission to build the first real, useful quantum computers, capable of delivering the world-changing applications that the technology has long promised. We know that means we will need to build a system with roughly 1 million qubits that supports fault tolerant error correction within a scalable architecture, and a data center footprint.

By harnessing the laws of quantum physics, quantum computers can provide exponential performance increases over today's most powerful supercomputers, offering the potential for extraordinary advances across a broad range of industries including climate, energy, healthcare, pharmaceuticals, finance, agriculture, transportation, materials design, and many more.

PsiQuantum has determined the fastest path to delivering a useful quantum computer, years earlier than the rest of the industry. Our architecture is based on silicon photonics which gives us the ability to produce our components at Tier-1 semiconductor fabs such as GlobalFoundries where we leverage high-volume semiconductor manufacturing processes, the same processes that are already producing billions of chips for telecom and consumer electronics applications. We also benefit from the quantum mechanics reality that photons don't feel heat or electromagnetic interference, allowing us to take advantage of existing cryogenic cooling systems and industry standard fiber connectivity.

In 2024, PsiQuantum announced two government-funded projects to support the build-out of our first Quantum Data Centers and utility-scale quantum computers in Brisbane, Australia and Chicago, Illinois. Both projects are backed by nations that understand quantum computing's potential impact and the need to scale this technology to unlock that potential. And we won't just be building the hardware, but also the fault tolerant quantum applications that will provide industry-transforming results.

Quantum computing is not just an evolution of the decades-old advancement in compute power. It provides the key to mastering our future, not merely discovering it. The potential is enormous, and we have the plan to make it real. Come join us.

There's much more work to be done and we are looking for exceptional talent to join us on this extraordinary journey!

Job Summary :

Join the OS / Platform team as a Site Reliability Engineer (SRE) and keep our services healthy, observable, and fast. Partnering with the Platform Engineering group, you'll own the daytoday operation of our monitoring stack-Grafana, Prometheus, Loki, and Tempo-crafting dashboards that surface golden signals and drive realtime insight. You'll codify reliability through SLIs / SLOs, automate runbooks in Python, and lead incident response to maintain worldclass uptime across both onprem and AWS environments.

Responsibilities :

  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="1" data-aria-level="1">
  • Define, implement, and iterate on Service Level Indicators & Service Level Objectives (SLIs / SLOs) and error budgets for critical services, with a focus on network reliability and data centre interconnects.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="2" data-aria-level="1">
  • Build and maintain Grafana dashboards that visualize golden signals (latency, traffic, errors, saturation), extending coverage to network telemetry such as packet loss, jitter, bandwidth utilization, and BGP / EVPN stability.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="3" data-aria-level="1">
  • Operate and tune the observability pipeline (Prometheus, Loki, Tempo) to ensure scalable, low-latency telemetry ingestion and alerting for networking as well as compute layers.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="4" data-aria-level="1">
  • Drive incident response : triage, mitigate, perform post-incident reviews, and implement preventive actions-particularly for network-related outages, congestion, or misconfigurations.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="5" data-aria-level="1">
  • Develop automation and self-service tooling in Python / Bash to streamline alerts, runbooks, and operational tasks, including network monitoring and diagnostics.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="6" data-aria-level="1">
  • Collaborate with Platform, Product, and Networking teams on capacity planning, performance testing, traffic engineering, and change management.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="7" data-aria-level="1">
  • Improve CI / CD health checks and release safety nets within GitLab, with attention to network dependencies in deployments.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="8" data-aria-level="1">
  • Contribute to Infrastructure as Code (Terraform, Ansible) for monitoring stack deployments and upgrades, including network observability tooling and configuration
  • Experience / Qualifications :

  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="1" data-aria-level="1">
  • Bachelor's Degree or higher in Computer Science, Engineering, or related technical field.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="2" data-aria-level="1">
  • 5+ years in an SRE, DevOps, or Production Engineering role supporting distributed systems in production.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="3" data-aria-level="1">
  • Hands-on expertise with observability tools : Grafana, Prometheus, Loki, Tempo (or equivalent).
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="4" data-aria-level="1">
  • Proven track record designing dashboards and alerts around golden signals and USE / RED methodologies, extended to network utilization, saturation, and error metrics.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="5" data-aria-level="1">
  • Solid scripting / automation skills in Python and Bash; familiarity with GitLab CI pipelines.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="6" data-aria-level="1">
  • Operational experience with Kubernetes and containerized workloads.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="7" data-aria-level="1">
  • Strong working knowledge of AWS services, data centre networking fundamentals, routing protocols, load balancing, and network overlays (e.g., VXLAN / EVPN).
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="8" data-aria-level="1">
  • Experience running incident response and writing actionable post-mortems, including for network-related events.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="9" data-aria-level="1">
  • Familiarity with Infrastructure as Code (Terraform, Ansible) and configuration management.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="10" data-aria-level="1">
  • Exposure to regulated environments, multi-region networking architectures, and hybrid on-prem / cloud topologies is a plus.
  • ","469777815" : "hybridMultilevel"}" data-aria-posinset="11" data-aria-level="1">
  • Strong communication and collaboration skills; comfortable acting as a generalist across infrastructure, networking, application, and data layers.
  • PsiQuantum provides equal employment opportunity for all applicants and employees. PsiQuantum does not unlawfully discriminate on the basis of race, color, religion, sex (including pregnancy, childbirth, or related medical conditions), gender identity, gender expression, national origin, ancestry, citizenship, age, physical or mental disability, military or veteran status, marital status, domestic partner status, sexual orientation, genetic information, or any other basis protected by applicable laws.

    Note : PsiQuantum will only reach out to you using an official PsiQuantum email address and will never ask you for bank account information as part of the interview process. Please report any suspicious activity to recruiting@psiquantum.com .

    We are not accepting unsolicited resumes from employment agencies.

    The ranges below reflect the target ranges for a new hire base salary. One is for the Bay Area (within 50 miles of HQ, Palo Alto), the second one (if applicable) is for elsewhere in the US (beyond 50 miles of HQ, Palo Alto). If there is only one range, it is for the specific location of where the position will be located. Actual compensation may vary outside of these ranges and is dependent on various factors including but not limited to a candidate's qualifications including relevant education and training, competencies, experience, geographic location, and business needs. Base pay is only one part of the total compensation package. Full time roles are eligible for equity and benefits. Base pay is subject to change and may be modified in the future.

    U.S. Base Pay Range

    $120,000 — $140,000 USD

    Bay Area Pay Range

    $145,000 — $165,000 USD

    Create a job alert for this search

    Site Reliability Engineer • Palo Alto, CA, United States

    Related jobs
    Principal Site Reliability Engineer

    Principal Site Reliability Engineer

    Fortinet • Santa Clara, CA, United States
    Full-time
    At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, and obsess...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Site Reliability Engineer

    Sr. Site Reliability Engineer

    Globality • Palo Alto, California, United States
    Full-time
    Joel Hyatt and Lior Delgo founded Globality with a vision to create prosperous and healthy economies, companies, communities, and individuals. In this new era of the Autonomous Enterprise, Globality...Show more
    Last updated: 26 days ago • Promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    Prosper • San Francisco, California, United States
    Full-time
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer - Inference

    Site Reliability Engineer - Inference

    Lambda • San Francisco, California, United States
    Full-time
    In 2012, Lambda started with a crew of AI engineers publishing research at top machine-learning conferences.We began as an AI company built by AI engineers. Today, we're on a mission to be the world...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Hive • San Francisco, California, United States
    Full-time
    Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Fortinet • Sunnyvale, CA, United States
    Full-time
    At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, and obsess...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Psiquantum • Palo Alto, California, United States
    Full-time
    Quantum computing holds the promise of humanity’s mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Xai • Palo Alto, California, United States
    Full-time
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Conductorone • San Francisco, California, United States
    Full-time
    ConductorOne is the modern identity governance platform that makes it possible to move beyond the limitations of legacy IGA and reduce the identity attack surface with confidence.Designed for flexi...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer - Openstack

    Site Reliability Engineer - Openstack

    Fortinet • Sunnyvale, CA, United States
    Full-time
    Fortinet is recruiting a Site Reliability Engineer- OPENSTACK to join our FortiStack team.This team is responsible for the management, operation and continued development of our Openstack-based pri...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Natcast • Sunnyvale, California, United States
    Full-time
    Natcast (short for The National Center for the Advancement of Semiconductor Technology) is a new, purpose-built, non-profit entity created to operate the National Semiconductor Technology Center (N...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Checkr • San Francisco, California, United States
    Full-time
    Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Replit • Foster City, California, United States
    Full-time
    Replit is the fastest way to turn ideas into software.With our powerful AI-powered Agent and Assistant, anyone can create and launch apps from natural language in just one click.Build and deploy fu...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer - Supercomputing

    Site Reliability Engineer - Supercomputing

    Xai • Palo Alto, California, United States
    Full-time
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show more
    Last updated: 30+ days ago • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Visa • Foster City, California, United States
    Full-time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Checkr • San Francisco, California, United States
    Full-time
    Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Runloop • San Francisco, California, United States
    Full-time
    Runloop is building the foundational infrastructure for the next generation of AI development.We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxe...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Id.me • Mountain View, California, United States
    Full-time
    Consumers can verify their identity with ID.Over 152 million users experience streamlined login and identity verification with ID. More than 600+ consumer brands use ID.Commerce Department and is ap...Show more
    Last updated: 30+ days ago • Promoted