Talent.com
Staff / Principal Site Reliability Engineer

Staff / Principal Site Reliability Engineer

VezaSan Francisco, CA, United States
11 hours ago
Job type
  • Full-time
Job description

Staff / Principal Site Reliability Engineer

We are seeking an exceptional Staff / Principal Site Reliability Engineer to lead critical infrastructure initiatives and drive Innovation across our organization. You’ll architect scalable solutions, navigate complex technical challenges independently, and deliver results under tight deadlines in a fast paced environment. You will work cross‑functionally alongside builders who have helped shape the success of companies such all ways as Google, Okta, AWS, and Snowflake.

Strategic Leadership & Technical Execution

  • Lead enterprise‑wide reliability and infrastructure projects across multiple teams with high autonomy
  • Navigate ambiguous problem spaces and deliver innovative solutions under tight deadlines
  • Architect and deploy solutions for Cloud Prem and SaaS customers at scale
  • Drive technical innovation and establish SRE best practices across the organization
  • Respond to critical incidents, lead root cause analysis, and implement long‑term resolutions
  • Develop automation solutions to streamline operations and reduce manual workload
  • Participate in on‑call rotation and ensure effective incident handoff and documentation

Cross‑Functional Collaboration & Communication

  • Partner with Engineering, Product, and Customer Success teams to align reliability goals with business objectives
  • Communicate complex technical concepts effectively to technical and non‑technical audiences, including executives
  • Influence technical decisions across teams through thought leadership and demonstrated expertise
  • Build consensus and Drive adoption of new tools, processes, and architectural patterns
  • Customer‑Facing Technical Leadership

  • Provide tier 2 / 3 technical support to enterprise customers for complex troubleshooting
  • Work directly with customer technical teams to resolve deployment, configuration, and integration challenges
  • Conduct technical onboarding and provide expert guidance on platform architecture and best practices
  • Create customer‑facing documentation, troubleshooting guides, and run‑books
  • Lead customer calls and technical discussions as a trusted advisor
  • Team Development

  • Mentor SRE and engineering team members, elevating technical capabilities
  • Foster a culture of reliability, operational excellence, and continuous improvement
  • You have : Required Experience

  • BS degree in Computer Science or related field (or equivalent practical experience)
  • 7+ years in Site Reliability Engineering, DevOps, or Infrastructure Engineering
  • Proven track record leading large‑scale, cross‑team infrastructure projects from conception to production
  • Demonstrated ability to work autonomously on ambiguous projects with tight deadlines
  • Technical Expertise

  • 5+ years with AWS (VPC, EC2, RDS, EKS, CloudFormation) and cloud automation
  • Expert‑level experience with Kubernetes, Helm, Linux, and Terraform
  • Strong experience with GitOps model, distributed version control, and CI / CD pipelines
  • Proficiency with monitoring tools (Prometheus, Grafana, DataDog)
  • Strong programming / scripting skills (Python, Go, Bash) for automation
  • Deep understanding of distributed systems, microservices, and reliability patterns
  • Experience with Bazel and CueLang a plus
  • Leadership & Communication

  • Exceptional ability to articulate complex technical concepts to diverse audiences
  • Track record of Driving technical change across organizational boundaries
  • Successfully Delivered multiple complex projects under tight deadlines
  • Strong customer service orientation with patience and empathy
  • Work Style

  • Thrives in ambiguous environments and makes progress without perfect information
  • Hands‑on, "can do" attitude with bias for action
  • Low ego and high intellectual curiosity
  • Comfortable working across time zones
  • Self‑motivated with strong ownership mentality
  • Compensation Disclosure

    $184,000—$240,000 USD

    Compensation depends on skills, qualifications, experience, and work location. Variable compensation such as commission is not included.

    Our Culture

  • Ownership Mindset
  • Act with Integrity
  • Guardians of our Customers
  • Opinionated Humility
  • Build Trust, Earn Trust
  • Veza is proud to be an equal opportunity employer. We are committed to equal employment opportunities regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, or other applicable legally protected characteristics. We also consider qualified applicants according to applicable federal, state, and local laws. If a candidate with a disability requires an accommodation during the recruitment process, please email recruiting@veza.com.

    #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • San Francisco, CA, United States

    Related jobs
    • Promoted
    Principal Site Reliability Engineer

    Principal Site Reliability Engineer

    GenentechSouth San Francisco, CA, United States
    Full-time
    It's what drives us to innovate.To continuously advance science and ensure everyone has access to the healthcare they need today and for generations to come. Creating a world where we all have more ...Show moreLast updated: 2 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantumPalo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    CrusoeSan Francisco, CA, United States
    Full-time
    Crusoe is building the Worlds Favorite AI-first Cloud infrastructure company.Were pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to pow...Show moreLast updated: 1 day ago
    • Promoted
    Principal Site Reliability Engineer

    Principal Site Reliability Engineer

    FortinetSanta Clara, CA, United States
    Full-time
    At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, and obsess...Show moreLast updated: 24 days ago
    • Promoted
    Staff Engineer, Site Reliability

    Staff Engineer, Site Reliability

    ZapierSan Francisco, CA, United States
    Full-time
    Zapier is building a platform to help millions of businesses globally scale with automation and AI.Our mission is to make automation work for everyone by delivering products that delight our custom...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer - Supercomputing

    Site Reliability Engineer - Supercomputing

    XaiSan Francisco, CA, United States
    Full-time
    Site Reliability Engineer - Supercomputing.We are seeking a talented Site Reliability Engineer (SRE) to join our SuperComputing team. In this role, you'll ensure the reliability, scalability, and pe...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Runloop AISan Francisco, CA, United States
    Full-time
    Runloop is building the foundational infrastructure for the next generation of AI development.We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxe...Show moreLast updated: 15 days ago
    • Promoted
    Principal Site Reliability Engineer

    Principal Site Reliability Engineer

    Hewlett Packard Enterprise Development LPSan Jose, CA, United States
    Full-time
    Principal Site Reliability Engineer.This role has been designed as 'Hybrid' with an expectation that you will work on average 2 days per week from an HPE office. Hewlett Packard Enterprise is the gl...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PSI QuantumPalo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    ZscalerSan Jose, CA, United States
    Full-time
    Serving thousands of enterprise customers around the world including 45% of Fortune 500 companies, Zscaler (NASDAQ : ZS) was founded in 2007 with a mission to make the cloud a safe place to do busin...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ReplitFoster City, CA, United States
    Full-time
    Replit is the agentic software creation platform that enables anyone to build applications using natural language.With millions of users worldwide and over 500,000 business users, Replit is democra...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer Staff

    Site Reliability Engineer Staff

    HPESan Jose, CA, United States
    Full-time
    Site Reliability Engineer Staff.This role has been designed as ‘Hybrid’ with an expectation that you will work on average 2 days per week from an HPE office. Hewlett Packard Enterprise is the global...Show moreLast updated: 7 days ago
    • Promoted
    Staff Site Reliability Engineer - Kubernetes

    Staff Site Reliability Engineer - Kubernetes

    FivetranOakland, CA, United States
    Full-time
    From Fivetran's founding until now, our mission has remained the same : to make access to data as simple and reliable as electricity. With Fivetran, customer data arrives in their warehouses, canonic...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    Prosper.comSan Francisco, CA, United States
    Full-time
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    P2PSan Francisco, CA, United States
    Full-time
    Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers th...Show moreLast updated: 30+ days ago
    • Promoted
    Principal Site Reliability Engineer

    Principal Site Reliability Engineer

    HPESan Jose, CA, United States
    Full-time
    Principal Site Reliability Engineer.This role has been designed as ‘Hybrid’ with an expectation that you will work on average 2 days per week from an HPE office. Hewlett Packard Enterprise is the gl...Show moreLast updated: 7 days ago
    • Promoted
    Senior / Principal Site Reliability Engineer

    Senior / Principal Site Reliability Engineer

    DatacrunchSan Francisco, CA, United States
    Full-time +1
    Imagine a future where everyone has instant, low-cost access to intelligence.We’re building a fully featured European AI cloud - with everything one needs to train, experiment with, and deploy AI m...Show moreLast updated: 3 days ago
    • Promoted
    Staff Site Reliability Engineer, Fabric

    Staff Site Reliability Engineer, Fabric

    MongoDBSan Francisco, CA, United States
    Full-time
    Staff Site Reliability Engineer, Fabric.MongoDBs mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data.We enable organizations o...Show moreLast updated: 30+ days ago