Senior Staff Site Reliability Engineer (Cortex Observability)

Palo Alto NetworksSanta Clara, California, United States

30+ days ago

Job type

Full-time

Job description

Company Description

Our Mission

At Palo Alto Networks® everything starts and ends with our mission :

Being the cybersecurity partner of choice, protecting our digital way of life.

Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are done, and we’re looking for innovators who are as committed to shaping the future of cybersecurity as we are.

Who We Are

We take our mission of protecting the digital way of life seriously. We are relentless in protecting our customers and we believe that the unique ideas of every member of our team contributes to our collective success. Our values were crowdsourced by employees and are brought to life through each of us everyday - from disruptive innovation and collaboration, to execution. From showing up for each other with integrity to creating an environment where we all feel included.

As a member of our team, you will be shaping the future of cybersecurity. We work fast, value ongoing learning, and we respect each employee as a unique individual. Knowing we all have different needs, our development and personal wellbeing programs are designed to give you choice in how you are supported. This includes our FLEXBenefits wellbeing spending account with over 1,000 eligible items selected by employees, our mental and financial health resources, and our personalized learning opportunities - just to name a few!

At Palo Alto Networks, we believe in the power of collaboration and value in-person interactions. This is why our employees generally work full time from our office with flexibility offered where needed. This setup fosters casual conversations, problem-solving, and trusted relationships. Our goal is to create an environment where we all win with precision.

Job Description

Your Career

The Cortex team builds and delivers the industry’s most advanced SecOps platform, consisting of XDR, XSIAM, XSOAR, and XPANSE. As a member of the Cortex DevOps team, your role involves operating and maintaining a large-scale GCP environment, including the design, implementation, and continuous enhancement of our comprehensive observability systems. To meet the opportunities that such a role provides, you will have a deep knowledge of modern observability and monitoring tools and practices, having managed high cardinality metrics, implemented tracing, and operationalized large-scale logging solutions. As part of this role, you will collaborate closely with our engineering teams to develop innovative solutions that provide clear and actionable insights into our systems’ performance and health.

Your Impact

As a Senior Staff SRE with the Cortex Observability team, you will :

Cloud Expertise : Utilize your expertise in monitoring cloud platforms, particularly GCP, to optimize our infrastructure, leveraging cloud-native technologies.
Monitoring Expertise : Improve monitoring processes, alerts, and metrics. Work with development teams to ensure that all of our services have the right monitoring and metrics in place so that we detect problems before our customers do.
Incident Management : Leverage incident management processes to ensure efficient resolution of system issues and minimal impact on services.
Automation : Automate complex monitoring and alerting tasks by building tools for cloud operations, such as automated remediation of known issues and auto-scaling.
Continuously Improve : Stay up-to-date with cutting-edge technologies, evaluate their potential impact on our operations, and implement them when appropriate.
On-Call : Provide follow-the-sun operational coverage in the production of our Observability infrastructure..
Collaborate : Work with our Engineering team to influence the operability of the product and ensure the reliability and availability of our services.

Qualifications

Your Experience

DevOps / SRE Expertise : 5+ years of experience as a DevOps / SRE engineer with a passion for technology and a strong motivation for high reliability at the service level.

Observability Tools : High proficiency with Thanos, Prometheus, Grafana, Open Telemetry and other monitoring tools.

Incident and Alerts Management : Clear understanding of incident and alerts management using tools like Pagerduty and Prometheus Alert Manager..

Cloud Proficiency : High proficiency in either Google Cloud Platform or Amazon Web Services.

Kubernetes and Docker : High proficiency with Kubernetes and Docker for container orchestration.

Scripting and Automation : High proficiency in Python programming and Linux Shell commands. Experience with Ansible and Terraform for infrastructure as code.

Communication Skills : Effective communication and interpersonal skills, with the ability to work and coordinate between multiple teams in different time zones.

Troubleshooting : Ability to effectively troubleshoot and address emerging and complex problems.

Independence : Ability to operate independently, make decisions, take action, and take responsibility.

Additional Information

The Team

We’re trailblazers who dream big, take risks, and challenge cybersecurity’s status quo. It’s simple : we can’t accomplish our mission without diverse teams innovating together.

Compensation Disclosure

The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales / commissioned roles) is expected to be between $126k - $200K / YR. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found here .

Our Commitment

We’re problem solvers that take risks and challenge cybersecurity’s status quo. It’s simple : we can’t accomplish our mission without diverse teams innovating, together.

We are committed to providing reasonable accommodations for all qualified individuals with a disability. If you require assistance or accommodation due to a disability or special need, please contact us at [email protected] .

Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics.

All your information will be kept confidential according to EEO guidelines.

Create a job alert for this search

Senior Site Reliability Engineer • Santa Clara, California, United States

Related jobs

Promoted

Staff Site Reliability Engineer

Altana AISan Francisco, CA, United States

Full-time

AI can be a powerful tool for good in the world – at Altana we apply AI to the world’s largest organized body of supply chain data to power a more resilient, more secure, and more sustainable model...Show moreLast updated: 30+ days ago

Promoted

Senior Site Reliability Engineer – Platform

Icon VenturesSan Francisco, CA, United States

Full-time

At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.We blend cognitive science with machine learning to personalize and enhance the lear...Show moreLast updated: 3 days ago

Promoted

Site Reliability Engineer I

ProsperSan Francisco, CA, United States

Full-time

As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...Show moreLast updated: 25 days ago

Promoted

Site Reliability Engineer

PsiQuantumPalo Alto, CA, United States

Full-time

Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago

Promoted

Senior Staff Site Reliability Engineer - Platform

Icon VenturesSan Francisco, CA, United States

Full-time

At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, includin...Show moreLast updated: 3 days ago

Promoted

Staff Site Reliability Engineer

CheckrSan Francisco, CA, United States

Full-time

Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...Show moreLast updated: 21 days ago

Promoted

Senior Site Reliability Engineer

CorelightSan Francisco, CA, United States

Full-time

Senior Site Reliability Engineer.We are looking for a Senior Site Reliability Engineer to design, automate, and scale cloud and hybrid platforms that power AI / ML workloads and SaaS services.You\'ll...Show moreLast updated: 6 days ago

Promoted

Staff Site Reliability Engineer

Redwood Materials, Inc.San Francisco, CA, United States

Full-time

Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling—keeping critical minerals in circulation and driving the energy transition.Founded in 2...Show moreLast updated: 2 days ago

Promoted

Senior / Staff Site Reliability Engineer

CrusoeSan Francisco, CA, United States

Full-time

Crusoe Energy is on a mission to unlock value in stranded energy resources through the power of computation.We aim to align the long term interests of the climate with the future of global computin...Show moreLast updated: 30+ days ago

Promoted

Senior Staff Site Reliability Engineer - Platform

QuizletSan Francisco, CA, United States

Full-time

Promoted

Senior Site Reliability Engineer

Alembic TechnologiesSan Francisco, CA, United States

Full-time

Senior Site Reliability Engineer.This range is provided by Alembic Technologies.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.We’re looking fo...Show moreLast updated: 3 days ago

Promoted

Senior Site Reliability Engineer

Loft OrbitalSan Francisco, CA, United States

Full-time

Senior Site Reliability Engineer.This range is provided by Loft Orbital.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Loft Orbital is revoluti...Show moreLast updated: 30+ days ago

Promoted

Staff / Principal Site Reliability Engineer

The Resume DatabaseRedwood City, CA, United States

Full-time

Staff / Principal Site Reliability Engineer.Staff / Principal Site Reliability Engineer.You’ll architect scalable solutions, navigate complex technical challenges independently, and deliver results und...Show moreLast updated: 6 days ago

Promoted

Site Reliability Engineer - Observability

Rivian and Volkswagen Group TechnologiesPalo Alto, CA, United States

Full-time

Senior Site Reliability Engineer (SRE).RivianVW's Data Platform - Production Engineering team.In this role, you will design, implement, and scale robust observability systems to ensure the health, ...Show moreLast updated: 6 days ago

Promoted

Senior Site Reliability Engineer

CheckrSan Francisco, CA, United States

Full-time

Promoted

Senior Site Reliability Engineer

HiveSan Francisco, CA, United States

Full-time

Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

Rockwoods IncPleasanton, CA, United States

Full-time

Note : Candidates must have relevant experience in Medical / Healthcare domains, this is mandatory.Senior SRE Engineer - Pleasanton, 5 days office. Primary work : 24x7 On-call support and setting up mo...Show moreLast updated: 30+ days ago

Promoted

Staff Site Reliability Engineer - Platform

Icon VenturesSan Francisco, CA, United States

Full-time