Site Reliability Engineer

psiquantumStanford, CA, United States

1 day ago

Job type

Full-time

Job description

Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a real quantum computer. PsiQuantum is on a mission to build the first real, useful quantum computers, capable of delivering the world-changing applications that the technology has long promised. We know that means we will need to build a system with roughly 1 million qubits that supports fault tolerant error correction within a scalable architecture, and a data center footprint.

By harnessing the laws of quantum physics, quantum computers can provide exponential performance increases over today's most powerful supercomputers, offering the potential for extraordinary advances across a broad range of industries including climate, energy, healthcare, pharmaceuticals, finance, agriculture, transportation, materials design, and many more.

PsiQuantum has determined the fastest path to delivering a useful quantum computer, years earlier than the rest of the industry. Our architecture is based on silicon photonics which gives us the ability to produce our components at Tier-1 semiconductor fabs such as GlobalFoundries where we leverage high-volume semiconductor manufacturing processes, the same processes that are already producing billions of chips for telecom and consumer electronics applications. We also benefit from the quantum mechanics reality that photons don't feel heat or electromagnetic interference, allowing us to take advantage of existing cryogenic cooling systems and industry standard fiber connectivity.

In 2024, PsiQuantum announced two government-funded projects to support the build-out of our first Quantum Data Centers and utility-scale quantum computers in Brisbane, Australia and Chicago, Illinois. Both projects are backed by nations that understand quantum computing's potential impact and the need to scale this technology to unlock that potential. And we won't just be building the hardware, but also the fault tolerant quantum applications that will provide industry-transforming results.

Quantum computing is not just an evolution of the decades-old advancement in compute power. It provides the key to mastering our future, not merely discovering it. The potential is enormous, and we have the plan to make it real. Come join us.

There's much more work to be done and we are looking for exceptional talent to join us on this extraordinary journey!

Job Summary :

Join the OS / Platform team as a Site Reliability Engineer (SRE) and keep our services healthy, observable, and fast. Partnering with the Platform Engineering group, you'll own the daytoday operation of our monitoring stack-Grafana, Prometheus, Loki, and Tempo-crafting dashboards that surface golden signals and drive realtime insight. You'll codify reliability through SLIs / SLOs, automate runbooks in Python, and lead incident response to maintain worldclass uptime across both onprem and AWS environments.

Responsibilities :

Define, implement, and iterate on Service Level Indicators & Service Level Objectives (SLIs / SLOs) and error budgets for critical services, with a focus on network reliability and data centre interconnects.
Build and maintain Grafana dashboards that visualize golden signals (latency, traffic, errors, saturation), extending coverage to network telemetry such as packet loss, jitter, bandwidth utilization, and BGP / EVPN stability.
Operate and tune the observability pipeline (Prometheus, Loki, Tempo) to ensure scalable, low-latency telemetry ingestion and alerting for networking as well as compute layers.
Drive incident response : triage, mitigate, perform post-incident reviews, and implement preventive actions-particularly for network-related outages, congestion, or misconfigurations.
Develop automation and self-service tooling in Python / Bash to streamline alerts, runbooks, and operational tasks, including network monitoring and diagnostics.
Collaborate with Platform, Product, and Networking teams on capacity planning, performance testing, traffic engineering, and change management.
Improve CI / CD health checks and release safety nets within GitLab, with attention to network dependencies in deployments.
Contribute to Infrastructure as Code (Terraform, Ansible) for monitoring stack deployments and upgrades, including network observability tooling and configuration

Experience / Qualifications :

Bachelor's Degree or higher in Computer Science, Engineering, or related technical field.

5+ years in an SRE, DevOps, or Production Engineering role supporting distributed systems in production.

Hands-on expertise with observability tools : Grafana, Prometheus, Loki, Tempo (or equivalent).

Proven track record designing dashboards and alerts around golden signals and USE / RED methodologies, extended to network utilization, saturation, and error metrics.

Solid scripting / automation skills in Python and Bash; familiarity with GitLab CI pipelines.

Operational experience with Kubernetes and containerized workloads.

Strong working knowledge of AWS services, data centre networking fundamentals, routing protocols, load balancing, and network overlays (e.g., VXLAN / EVPN).

Experience running incident response and writing actionable post-mortems, including for network-related events.

Familiarity with Infrastructure as Code (Terraform, Ansible) and configuration management.

Exposure to regulated environments, multi-region networking architectures, and hybrid on-prem / cloud topologies is a plus.

Strong communication and collaboration skills; comfortable acting as a generalist across infrastructure, networking, application, and data layers.

PsiQuantum provides equal employment opportunity for all applicants and employees. PsiQuantum does not unlawfully discriminate on the basis of race, color, religion, sex (including pregnancy, childbirth, or related medical conditions), gender identity, gender expression, national origin, ancestry, citizenship, age, physical or mental disability, military or veteran status, marital status, domestic partner status, sexual orientation, genetic information, or any other basis protected by applicable laws.

Note : PsiQuantum will only reach out to you using an official PsiQuantum email address and will never ask you for bank account information as part of the interview process. Please report any suspicious activity to recruiting@psiquantum.com .

We are not accepting unsolicited resumes from employment agencies.

The ranges below reflect the target ranges for a new hire base salary. One is for the Bay Area (within 50 miles of HQ, Palo Alto), the second one (if applicable) is for elsewhere in the US (beyond 50 miles of HQ, Palo Alto). If there is only one range, it is for the specific location of where the position will be located.Actual compensation may vary outside of these ranges and is dependent on various factors including but not limited to a candidate's qualifications including relevant education and training, competencies, experience, geographic location, and business needs. Base pay is only one part of the total compensation package. Full time roles are eligible for equity and benefits. Base pay is subject to change and may be modified in the future.

U.S. Base Pay Range $120,000—$140,000 USD Bay Area Pay Range $145,000—$165,000 USD

Create a job alert for this search

Site Reliability Engineer • Stanford, CA, United States

Related jobs

Promoted

Site Reliability Engineer

VirtualVocationsSan Jose, California, United States

Full-time

A company is looking for a Site Reliability Engineer to join a Cloud Services team in a remote role.Key Responsibilities Serve as a cloud SME for clients, providing expertise in design, architect...Show moreLast updated: 30+ days ago

Site Reliability Engineer

DTEX SystemsFremont, CA, US

Full-time

Quick Apply

We are excited that you’ve taken the time to explore our business and potentially join us on this incredible journey.We are already the leader in the Insider Risk Management, but our story do...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer - SRE at Descope Los Altos, CA

Itlearn360Los Altos, CA, United States

Full-time

Site Reliability Engineer - SRE job at Descope.Descope R&D group is a skilled team of developers with a unique DNA of creativity,flexibility,anopen mindset. We are looking for a passionate SRE to jo...Show moreLast updated: 30+ days ago

Promoted

Senior Site Reliability Engineer

VirtualVocationsSanta Clara, California, United States

Full-time

A company is looking for a Senior Site Reliability Engineer.Key Responsibilities Design and implement infrastructure and automation scripts for AWS deployment and management Optimize and monitor...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineering Manager

VirtualVocationsFremont, California, United States

Full-time

A company is looking for a Manager, Software Engineer.Key Responsibilities Define and execute the strategic vision and roadmap for the Site Reliability Engineering function Provide leadership an...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Developer

VirtualVocationsHayward, California, United States

Full-time

A company is looking for a Site Reliability Developer.Key Responsibilities Perform DevOps activities to support customers and engineers during release cycles and production Respond to incidents,...Show moreLast updated: 30+ days ago

Promoted
New!

Site Reliability Engineer

ZapierSan Francisco, CA, United States

Full-time

We're humans who simply think computers should do more work.At Zapier, we’re not just making software—we’re building a platform to help millions of businesses globally scale with automation and AI....Show moreLast updated: 19 hours ago

Promoted

Senior Site Reliability Engineer

Rollbar, Inc.San Francisco, CA, United States

Full-time

Wikimedia Foundation is hiring a Senior Site Reliability Engineer (SRE) to join our Service Operations SRE team, where we take care of the infrastructure that runs wikipedia.The SRE team at Wikimed...Show moreLast updated: 29 days ago

Promoted

Site Reliability Engineer

PsiQuantumPalo Alto, CA, United States

Full-time

Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago

Promoted

Principal Site Reliability Engineer

VirtualVocationsOakland, California, United States

Full-time

A company is looking for a Principal Site Reliability Engineer.Key Responsibilities Lead project work to build and maintain platform features for reliability and cloud infrastructure Mentor serv...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer - Technical Lead

ZipRecruiterSan Francisco, CA, United States

Full-time

Veryon is a leading software and technology company that enables aviation teams around the world to improve efficiency and safety. Our products maximize uptime for aircraft maintenance teams through...Show moreLast updated: 6 days ago

Site Reliability Engineer

Foxconn Industrial Internet - FIISan Jose, CA, US

Full-time +1

Quick Apply

Site Reliability Engineer Foxconn Industrial Internet (Fii), is a world leading professional design and manufacturing service provider of communication network equipment, cloud service equipment, p...Show moreLast updated: 30+ days ago

Promoted

Sr. Site Reliability Engineer

CENTRL IncSan Francisco, CA, United States

Full-time

CENTRL is a rapidly growing Silicon Valley technology company specializing in third-party risk, due diligence, cyber risk, and security. With offices in the SF Bay Area, NY, Australia, and India, CE...Show moreLast updated: 3 days ago

Promoted

Customer Reliability Engineer

VirtualVocationsFremont, California, United States

Full-time

A company is looking for a Customer Reliability Engineer III.Key Responsibilities Manage and resolve customer technical issues via support tickets and real-time interactions Act as a liaison bet...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

CRM HikeSan Francisco, CA, United States

Full-time

Perplexity is seeking a Site Reliability Engineer (SRE) to join our small team in revolutionizing the way people search and interact with the internet. You will be responsible for leading the design...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

WritemedSan Francisco, CA, United States

Full-time

Would you like to join one of the fastest-growing organizations with a goal of using the latest AI, GenAI, LLM, Cloud, and Digital Technologies to advance drug development and improve patient care ...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer (SRE)

Air AppsSan Francisco, CA, United States

Full-time

At Air Apps, we believe in thinking bigger—and moving faster.We’re a family-founded company on a mission to create the world’s first AI-powered Personal & Entrepreneurial Resource Planner (PRP), an...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer II

PinterestSan Francisco, CA, United States

Full-time

Millions of people around the world come to our platform to find creative ideas, dream about new possibilities and plan for memories that will last a lifetime. At Pinterest, we're on a mission to br...Show moreLast updated: 4 days ago