Principal Site Reliability Engineer (TDP)

Palo Alto NetworksSanta Clara, CA, United States

1 day ago

Job type

Full-time

Job description

Our Mission

At Palo Alto Networks® everything starts and ends with our mission :

Being the cybersecurity partner of choice, protecting our digital way of life.

Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are done, and we're looking for innovators who are as committed to shaping the future of cybersecurity as we are.

Who We Are

We believe collaboration thrives in person. That's why most of our teams work from the office full time, with flexibility when it's needed. This model supports real-time problem-solving, stronger relationships, and the kind of precision that drives great outcomes.

Your Career

Palo Alto Networks runs a large infrastructure and is one of the largest GCP customers. As a Principle Site Reliability Engineer for the TDP team, you will be part of a team supporting the services running on this infrastructure. This includes automation, architecture, performance, observability, troubleshooting, security, and reliability.

Our Infrastructure Platform stack includes Terraform, Kubernetes, GitLab CI / CD, GitOps, Prometheus, Grafana, Loki, Docker, GCP, ESO, Kafka, Neo4j, Spanner, MongoDB, Cassandra, BigQuery, Redshift, MySQL, Python, Bash, and Go.

Your Impact

Contribute to the success of SRE and DevOps

Develop expertise in new technologies

Work with developers, researchers, data scientists, and security experts

Design, build and operate reliable, secure Cloud infrastructure

Ensure that applications are production-ready, scalable, and reliable

Develop tools and automation frameworks

Automate robust deployment of robust services

Orchestrate end-to-end monitoring and alerting

Participate with SRE and Dev teams in the on-call rotation

Lead root cause analysis of critical business and production issues

Design, implement, and maintain the company's database systems to ensure optimal performance, availability, and stability.

Safeguard sensitive data by implementing and managing robust security measures.

Develop and manage reliable backup and recovery strategies to prevent data loss and ensure business continuity.

Collaborate with development and IT teams to support applications and infrastructure that rely on the databases.

Proactively monitor database performance to identify and resolve bottlenecks, slow queries, and resource contention issues.

Optimize complex SQL queries, stored procedures, and database configurations (tuning).

Manage and optimize database objects, including tables, indexes, and schemas, to improve efficiency and responsiveness

Design, implement, and manage comprehensive backup and recovery procedures.

Perform regular testing of backups and restore procedures to ensure data can be recovered swiftly and accurately in a disaster scenario.

Develop and maintain disaster recovery plans and execute them during system outages.

Your Experience

6+ years as an engineer in Infrastructure, Operations, SRE, DevOps, or System Engineering

4+ years building high availability, scalable cloud-native applications on AWS and GCP

BS or MS in Computer Science, a related field, or equivalent professional experience or equivalent military experience required

Expert proficiency in SQL and GraphQL.

Deep working knowledge of at least one major relational database platform (e.g. Neo4j, Spanner, MySQL, PostgreSQL, AlloyDB).

Experience with database design, data modeling, and data warehousing concepts.

Strong understanding of backup, recovery, performance monitoring, and tuning techniques.

Expertise in configuration management with a framework such as Ansible, Terraform, Helm

Passion for infrastructure and monitoring as code

Solid experience in container workloads and Kubernetes

Familiarity with PKI concepts, Networking concepts

In-depth knowledge of different security controls ( app-id, user-id, security profile, url category, content, ssl decryption, firewall MFA etc)

Linux administration, internals, and network troubleshooting

Proficiency with programming languages like Golang or Python along with shell scripting to automate tasks.

Proficiency with CI / CD pipelines, ArgoCD and GitLab CI / CD.

Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions

Experience with managing Kafka is a plus

Excellent written and verbal communication, able to collaborate and rally support

Self-disciplined, self-managed, self-motivated, strong sense of ownership, urgency, and drive.

Ready to understand and dissect new technology stacks quickly

Excellent written and verbal communication, able to collaborate and rally support

Experience with Cloud Database Services (e.g., Amazon RDS, Azure SQL Database, Google Cloud SQL).

Relevant professional certifications (e.g., Oracle Certified Professional (OCP), Microsoft Certified : Azure Database Administrator Associate).

Experience with NoSQL databases (e.g., MongoDB, Cassandra).

Familiarity with data governance and regulatory compliance standards (e.g., GDPR, HIPAA)

The Team

Our engineering team is at the core of our products - connected directly to the mission of preventing cyberattacks. We are constantly innovating - challenging the way we, and the industry, think about cybersecurity. Our engineers don't shy away from building products to solve problems no one has pursued before.

We define the industry, instead of waiting for directions. We need individuals who feel comfortable in ambiguity, excited by the prospect of a challenge, and empowered by the unknown risks facing our everyday lives that are only enabled by a secure digital environment.and downtime.

Compensation Disclosure

The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales / commissioned roles) is expected to be between $147,000 - $230,000 / YR. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found here () .

#LI-TD1

Our Commitment

We're problem solvers that take risks and challenge cybersecurity's status quo. It's simple : we can't accomplish our mission without diverse teams innovating, together.

We are committed to providing reasonable accommodations for all qualified individuals with a disability. If you require assistance or accommodation due to a disability or special need, please contact us at accommodations@paloaltonetworks.com .

Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics.

All your information will be kept confidential according to EEO guidelines.

Is role eligible for Immigration Sponsorship? : Yes

Create a job alert for this search

Site Reliability Engineer • Santa Clara, CA, United States

Related jobs

Promoted

Senior Site Reliability Engineer

NVIDIASanta Clara, CA, United States

Full-time

NVIDIA is looking for a Senior Site Reliability Engineer to work in IPP (Infrastructure, Planning and Process).IPP is a global organization within NVIDIA. This group works with various other groups ...Show moreLast updated: 3 days ago

Promoted

Site Reliability Engineer I

ProsperSan Francisco, CA, United States

Full-time

As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...Show moreLast updated: 8 days ago

Promoted

Site Reliability Engineer

PsiQuantumPalo Alto, CA, United States

Full-time

Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago

Promoted

Principal Site Reliability Engineer Prisma Access

Palo Alto NetworksSanta Clara, CA, United States

Full-time

Due to government environments this position supports, the role requires US Citizenship.Palo Alto Networks runs a large infrastructure and is one of the biggest GCP customers.As a Principal SRE, yo...Show moreLast updated: 3 days ago

Promoted

Site Reliability Engineer

Redwood Materials, Inc.San Francisco, CA, United States

Full-time

Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling—keeping critical minerals in circulation and driving the energy transition.Founded in 2...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

Diverse LynxSan Francisco, CA, United States

Full-time

Role : Site Reliability Engineer.Location : RTP, NC / San Jose, CA (Onsite).SRE, NetApp Storage, Linux Certified, Kubernetes Certified, DevOps, Docker, etc. Experienced Senior SRE working on Kubernetes...Show moreLast updated: 3 days ago

Promoted

Site Reliability Engineer

Insight GlobalSanta Clara, CA, United States

Full-time

Insight Global is looking for a seasoned SRE to join one of our largest technology clients' multifaceted and fast-paced Infrastructure, Planning and Processes organization where you will be working...Show moreLast updated: 3 days ago

Promoted

Site Reliability Engineer

Runloop AISan Francisco, CA, United States

Full-time

Runloop is building the foundational infrastructure for the next generation of AI development.We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxe...Show moreLast updated: 12 days ago

Promoted

Principal Site Reliability Engineer

Hewlett Packard Enterprise Development LPSan Jose, CA, United States

Full-time

Principal Site Reliability Engineer.This role has been designed as 'Hybrid' with an expectation that you will work on average 2 days per week from an HPE office. Hewlett Packard Enterprise is the gl...Show moreLast updated: 3 days ago

Promoted

Site Reliability Engineer

PSI QuantumPalo Alto, CA, United States

Full-time

Promoted

Site Reliability Engineer

XaiPalo Alto, CA, United States

Full-time

AIs mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellen...Show moreLast updated: 3 days ago

Promoted

Site Reliability Engineer

ReplitFoster City, CA, United States

Full-time

Replit is the agentic software creation platform that enables anyone to build applications using natural language.With millions of users worldwide and over 500,000 business users, Replit is democra...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

Signify TechnologyPalo Alto, CA, US

Full-time

Competitive, based on experience.We are a technology startup advancing healthcare with a safety-focused AI platform that assists medical professionals by managing patient communications, including ...Show moreLast updated: 21 days ago

Promoted

Principal Site Reliability Engineer (Prisma Access)

Palo Alto NetworksSanta Clara, CA, United States

Full-time

At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer I

Prosper.comSan Francisco, CA, United States

Full-time

Promoted

Site Reliability Engineer - Supercomputing

XaiPalo Alto, CA, United States

Full-time

Site Reliability Engineer - Supercomputing.We are seeking a talented Site Reliability Engineer (SRE) to join our SuperComputing team. In this role, you'll ensure the reliability, scalability, and pe...Show moreLast updated: 3 days ago

Promoted

Site Reliability Engineer

P2PSan Francisco, CA, United States

Full-time

Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers th...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

Rockwoods IncPleasanton, CA, US

Full-time

Note : Candidates must have relevant experience in Medical / Healthcare domains, this is mandatory.Senior SRE Engineer - Pleasanton, 5 days office. Primary work : 24x7 On-call support and setting up mo...Show moreLast updated: 21 days ago