Talent.com
Principal Kafka Site Reliability Engineer DevOps
Principal Kafka Site Reliability Engineer DevOpspalo_alto_networks • Santa Clara, CA, United States
Principal Kafka Site Reliability Engineer DevOps

Principal Kafka Site Reliability Engineer DevOps

palo_alto_networks • Santa Clara, CA, United States
30+ days ago
Job type
  • Full-time
Job description

We are reshaping the cybersecurity market through our cloud-delivered security services, and our cloud infrastructure is quickly and massively growing with a global footprint. We’re looking for great SREs, as well as software engineers interested in production engineering, to help us scale the largest enterprise security cloud infrastructure in the world.

Description

Palo Alto Networks reinvented the enterprise firewall, growing from a start-up to a multi-billion-dollar company. Our Application Framework, the latest offering in our cloud-delivered security services, ingests security events from hundreds of thousands of firewalls deployed across the globe to provide a massive data analytics platform for deep inspection, anomaly detection, and actionable security automation. Our cloud infrastructure hosts a series of massive and complex distributed systems and virtualization software platforms that enable big data processing for security services, sandboxing and malware detection, URL categorization, and malicious site / domain identification, as well as security research and response.

RESPONSIBILITIES :

  • You will be responsible for maintaining and scaling production Kafka clusters with very high ingestion rates, Zookeeper clusters, and other big data pipeline systems such as Kafka and HDFS.
  • You will work on improving scalability, service reliability, capacity, and performance.
  • You will develop automation code for managing, monitoring, measuring, expanding, and healing clusters.
  • You are an experienced software engineer focused on operations, not just an operator.
  • You will perform Kafka tuning, capacity planning, and deep dive troubleshooting.
  • You will participate in occasional on-call rotations supporting the infrastructure.
  • You will troubleshoot incidents, formulate hypotheses, test them, and identify root causes.

QUALIFICATIONS :

  • Hands-on experience managing production Kafka clusters.
  • Strong development and automation skills, especially with Python; familiarity with Kafka source code is a plus.
  • Deep understanding of Kafka internals, Zookeeper, partitioning, topic replication, and mirroring.
  • Excellent monitoring, metrics collection, performance tuning, and troubleshooting skills for distributed systems.
  • Tools-first mindset : building tools to increase efficiency and simplify tasks.
  • Organized, focused on delivery, good communicator, team player, and proactive in ownership.
  • Learn more about Palo Alto Networks here and check out our fast facts . #LI-MB1

    #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • Santa Clara, CA, United States

    Related jobs
    Principal DevOps Engineer

    Principal DevOps Engineer

    Informatica LLC • Redwood City, CA, United States
    Full-time
    Build Your Career at Informatica.We seek innovative thinkers who believe in the power of data to drive meaningful change. At Informatica, we welcome adventurous minds eager to solve the world's most...Show more
    Last updated: 27 days ago • Promoted
    Principal Site Reliability Engineer (SASE)

    Principal Site Reliability Engineer (SASE)

    Palo Alto Networks • Cupertino, California, United States
    Full-time
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer (Cortex)

    Senior Site Reliability Engineer (Cortex)

    Palo Alto Networks • Santa Clara, California, United States
    Full-time
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show more
    Last updated: 30+ days ago • Promoted
    Principal, DevOps Engineer

    Principal, DevOps Engineer

    Ptc • San Mateo, California, United States
    Full-time
    Lead DevOps Strategy : Define and drive the DevOps roadmap, aligning with business and engineering goals.Infrastructure as Code (IaC) : Design and implement scalable, secure, and resilient infrastruc...Show more
    Last updated: 30+ days ago • Promoted
    Cloud Site Reliability Engineer (SRE)

    Cloud Site Reliability Engineer (SRE)

    Promise • Oakland, California, United States
    Full-time +1
    Promise empowers utilities and government agencies to create flexible, affordable solutions for individuals struggling with debt. Our innovative approach to payment plans and relief distribution sig...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantum • Palo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer, Compute

    Senior Site Reliability Engineer, Compute

    Crusoe • San Francisco, California, United States
    Full-time
    Crusoe is building the World’s Favorite AI-first Cloud infrastructure company.We’re pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to ...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Hive • San Francisco, California, United States
    Full-time
    Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Psiquantum • Palo Alto, California, United States
    Full-time
    Quantum computing holds the promise of humanity’s mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Site Reliability Engineer

    Sr. Site Reliability Engineer

    Prosper • San Francisco, California, United States
    Full-time
    As a Senior Site Reliability Engineer (SRE) at Prosper, you will be instrumental in enhancing the reliability, scalability, and maintainability of our technology platform.This role bridges the gap ...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Loft Orbital Solutions • San Francisco, California, United States
    Full-time
    Loft Orbital builds a space infrastructure providing a fast & simple path to orbit.We operate satellites, fly customer payloads onboard and handle the entire mission from initial concept through in...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Crusoe • San Francisco, California, United States
    Full-time
    Crusoe is building the World’s Favorite AI-first Cloud infrastructure company.We’re pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to ...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer - Managed Kubernetes

    Senior Site Reliability Engineer - Managed Kubernetes

    Lambda • San Francisco, California, United States
    Remote
    Full-time
    We're here to help the smartest minds on the planet build Superintelligence.The labs pushing the edge? They run on Lambda. Our gear trains and serves their models, our infrastructure scales with the...Show more
    Last updated: 30+ days ago • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Visa • Foster City, California, United States
    Full-time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show more
    Last updated: 30+ days ago • Promoted
    Senior DevOps Engineer

    Senior DevOps Engineer

    Fortinet • Sunnyvale, CA, United States
    Full-time
    Join Fortinet, a cybersecurity pioneer with over two decades of excellence, as we continue to shape the future of cybersecurity and redefine the intersection of networking and security.At Fortinet,...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer Cloud Platform

    Senior Site Reliability Engineer Cloud Platform

    Zilliz • Redwood City, California, United States
    Full-time
    Zilliz is a fast-growing startup developing the industry’s leading .Founded by the engineers behind Milvus, the world’s most popular . On a mission to democratize AI, Zilliz is committed to simplify...Show more
    Last updated: 30+ days ago • Promoted
    Principal Site Reliability Engineer Cloud Identity & Trust

    Principal Site Reliability Engineer Cloud Identity & Trust

    5 Star Recruitment • San Jose, California, United States
    Full-time
    SPIFFE - Experience SPIRE - Experience Multiple Cloud Experience Kubernetes.Deep Knowledge base of Development Identity Service Experience. Proficiency in operating and supporting cloud-based servic...Show more
    Last updated: 30+ days ago • Promoted
    Lead DevOps Engineer

    Lead DevOps Engineer

    Zeta Global • San Francisco, California, United States
    Full-time
    Zeta Global (NYSE : ZETA) is the AI-Powered Marketing Cloud that leverages advanced artificial intelligence (AI) and trillions of consumer signals to make it easier for marketers to acquire, grow, a...Show more
    Last updated: 2 days ago • Promoted