Talent.com
Director, Site Reliability Engineering - Infrastructure Platform

Director, Site Reliability Engineering - Infrastructure Platform

OktaSan Francisco, CA, United States
3 days ago
Job type
  • Permanent
Job description

Director, Site Reliability Engineering – Infrastructure Platform

Okta is The World’s Identity Company. Okta provides secure access, authentication, and automation, placing identity at the core of business security and growth.

The Infrastructure Platform and Shared Services Team

Okta authenticates, authorizes and provisions millions of users a day. The service is hosted on Amazon Web Services (AWS) across multiple availability zones and geographically separated regions. The service is designed for high throughput, and 99.999% availability. We’re looking for a technical leader to help us to continue to scale the service with great people and reliable, cost-effective and efficient infrastructure, processes and tooling.

What You’ll Be Doing

  • Lead the infra platform and shared services org and various initiatives across SRE & Infrastructure organization.
  • Lead the DevOps transformation, microservice journey, and next generation infra platform capabilities in partnership with architects and product engineering.
  • Build a world‑class observability platform and monitoring capabilities enabled with self‑service.
  • Accelerate the velocity of SRE and product engineering by developing robust platforms, powerful tooling, and intuitive self‑service capabilities.
  • Own the design and operation of scalable, self‑service Cloud infrastructure platforms (e.g., Kubernetes, service mesh, CI / CD pipelines, IaC & Edge Infrastructure).
  • Lead, mentor, and grow a high‑performing team of engineers and managers across platform, infrastructure, and shared services domains.
  • Perform engineering design evaluations and ensure the completion of projects within resource, budget, and scheduling constraints.
  • Improve SDLC processes for Cloud infrastructure as code, including the maturity of CI / CD pipelines, change and release management.
  • Manage service and business expectations and prioritize resource allocation.
  • Maintain a deep knowledge of industry best practices, evolving trends, and technologies.

What You’ll Bring To The Role

  • 8+ years of experience in technical leadership & people management.
  • Extensive experience using Agile and DevOps methodologies to build product infrastructure and shared services at scale.
  • 3+ years of experience running large‑scale infrastructure platforms supporting a SaaS / Cloud service in a public Cloud, preferably AWS. Experience supporting a multi‑Cloud environment will be a plus.
  • Strong expertise in cloud‑native architectures, containerization (Kubernetes), IaC (Terraform), and CI / CD pipelines.
  • Strong background and hands‑on experience in SW development, PaaS and automation.
  • Deep experience with building and operating observability platforms and monitoring tools (Grafana, Splunk, APM, etc.) in a large‑scale environment.
  • Demonstrated ability to lead cross‑functional teams and manage large‑scale programs.
  • Effective verbal, written communication and interpersonal skills.
  • Computer Science Degree or related degree or equivalent experience.
  • Additional Requirements

  • This position requires the ability to access federal environments and / or have access to protected federal data. As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g., a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.
  • Compensation and Benefits

    Annual base salary range for candidates located in California : $266,000—$398,000 USD. Okta offers equity (where applicable), bonus, and benefits, including health, dental and vision insurance, 401(k), flexible spending account, and paid leave (including PTO and parental leave). To learn more about our Total Rewards program please visit : https : / / rewards.okta.com / us.

    End‑of‑Job Legal Statements

    Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws. If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please use this Form to request an accommodation.

    #J-18808-Ljbffr

    Create a job alert for this search

    Director Site Engineering • San Francisco, CA, United States

    Related jobs
    • Promoted
    Sr. Director - Site Reliability Engineering

    Sr. Director - Site Reliability Engineering

    VisaFoster City, CA, United States
    Full-time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show moreLast updated: 30+ days ago
    • Promoted
    Director, Cloud Site Operations

    Director, Cloud Site Operations

    CrusoeSan Francisco, CA, US
    Full-time
    Crusoe's mission is to accelerate the abundance of energy and intelligence.We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrif...Show moreLast updated: 28 days ago
    • Promoted
    Principal Site Reliability Engineer (SASE)

    Principal Site Reliability Engineer (SASE)

    Palo Alto NetworksCupertino, California, United States
    Full-time
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer (Cortex)

    Senior Site Reliability Engineer (Cortex)

    Palo Alto NetworksSanta Clara, California, United States
    Full-time
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantumPalo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    HiveSan Francisco, California, United States
    Full-time
    Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...Show moreLast updated: 30+ days ago
    • Promoted
    Product Infrastructure Engineer - Site Reliability

    Product Infrastructure Engineer - Site Reliability

    ZyphraPalo Alto, California, United States
    Full-time
    Infrastructure Engineer - Site Reliability.Your work will be essential to ensuring the reliability and reproducibility of ML workloads, the safety and control of deployments, and the long-term main...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiquantumPalo Alto, California, United States
    Full-time
    Quantum computing holds the promise of humanity’s mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    Sr. Site Reliability Engineer

    Sr. Site Reliability Engineer

    ProsperSan Francisco, California, United States
    Full-time
    As a Senior Site Reliability Engineer (SRE) at Prosper, you will be instrumental in enhancing the reliability, scalability, and maintainability of our technology platform.This role bridges the gap ...Show moreLast updated: 30+ days ago
    • Promoted
    Sr. Site Reliability Engineer

    Sr. Site Reliability Engineer

    Pure StorageSanta Clara, California, United States
    Full-time
    We’re in an unbelievably exciting area of tech and are fundamentally reshaping the data storage industry.Here, you lead with innovative thinking, grow along with us, and join the smartest team in t...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    NatcastSunnyvale, California, United States
    Full-time
    Natcast (short for The National Center for the Advancement of Semiconductor Technology) is a new, purpose-built, non-profit entity created to operate the National Semiconductor Technology Center (N...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    CrusoeSan Francisco, California, United States
    Full-time
    Crusoe is building the World’s Favorite AI-first Cloud infrastructure company.We’re pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood MaterialsSan Francisco, California, United States
    Full-time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling .Responsibilities will include : . Collect business & technical requirements and work wit...Show moreLast updated: 30+ days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    VisaFoster City, California, United States
    Full-time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ReplitFoster City, California, United States
    Full-time
    Replit is the fastest way to turn ideas into software.With our powerful AI-powered Agent and Assistant, anyone can create and launch apps from natural language in just one click.Build and deploy fu...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CheckrSan Francisco, California, United States
    Full-time
    Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer - Supercomputing

    Site Reliability Engineer - Supercomputing

    XaiPalo Alto, California, United States
    Full-time
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    CheckrSan Francisco, California, United States
    Full-time
    Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...Show moreLast updated: 30+ days ago