Talent.com
Principal Site Reliability Engineer
Principal Site Reliability EngineerExpel • Remote, Remote, United States
Principal Site Reliability Engineer

Principal Site Reliability Engineer

Expel • Remote, Remote, United States
30+ days ago
Job type
  • Full-time
  • Remote
Job description

Your passion for uptime was forged from experience in production and refined through incident response. You’re an Expel Principal Site Reliability Engineer - a protector, champion, and leader of Expel's reputation for service reliability.

Innovation comes naturally to you, but you're also eager to help others. You understand that operational reliability is a shared mission across all of engineering, and that your role is to make it as easy as possible for Expel to achieve that mission. You spend your mornings collaborating with architects and product stakeholders to outline the next quarter’s reliability initiatives, then the afternoon pair-programming with a junior SRE to mentor them in debugging a tricky Kubernetes deployment.

You apply your dedication to reliability and collaboration with the broader SRE community to ensure Expel maintains outstanding reliability standards within the cloud native ecosystem.

You take pride in all the nines of uptime you’ve achieved, but you know that an SRE’s job is never done!

What Expel can do for you

  • Provide an opportunity to grow and maintain reliability-focused platform features within a cloud native engineering platform using modern infrastructure and tooling (Kubernetes / GKE / EKS runtime, Hashicorp toolset, etc)
  • Provide you a mission you can get behind : stopping evil hackers so our customers can focus on their business
  • Be included in a company focused on creating opportunities to do interesting work and creating space for employees to learn and grow
  • An opportunity to contribute to a best-in-class product
  • A leadership team that’s embraced modern Site Reliability principles, as outlined by Google and other industry leaders.

What you can do for Expel

  • Lead project work to build and maintain platform features that cut across the Expel product’s reliability, networking, and cloud infrastructure.
  • Contribute by pushing IaC commits daily, with occasional opportunities to write and test application code in Python, Golang, and Javascript
  • Mentor and motivate service owners on how to use the platform in order to deploy, measure, monitor, and operate their own services at scale.
  • Participate in a weekly support rotation that includes taking the on-call pager and providing nearly on-demand working-hours support to platform users.
  • Lead incident response, triage, and root cause analysis support
  • Poke fun at our leadership team in creative ways.
  • What you should bring with you

  • A passion for learning and improving your work product
  • Significant experience operating Kubernetes within highly distributed environments
  • Experience running systems in GCP or AWS
  • Exposure to monitoring and observability infrastructure and standard methodologies
  • An understanding of infrastructure-as-code practices, tools, and patterns
  • Some experience developing software in Linux environments, preferably with Python and / or Golang
  • A customer-minded approach that enables the success of platform users as well as building trust across the organization.
  • A collaborative disposition that allows you to work optimally on and across teams
  • Six years of systems experience either in operations or development
  • Missing some items on the list? That's ok! We still want to talk to you!
  • How our team works together

    We build and run teams where everyone is pulling in the same direction and is learning from each other :

  • We work out of a shared backlog
  • We pair-program weekly, as it makes sense
  • We peer-review everything
  • We do weekly blame-free retros to reinforce what’s going well, so we do more of it, and surface what’s not going well, so we can do something about it. Same thing for projects and significant operational problems.
  • Our hiring process

    We respect your time. You’ll hear from us by the end of the next business day after completing an interview.

    We also have a goal that all Expletives have a great manager and have a voice in how their team is run and who runs it. It’s not the shortest process in the industry, but you’ll get to meet nearly everyone you’ll work with day-to-day and your Engineering leadership. New Expletives consistently say our interview process gave them an accurate picture of what it’s like to work here.

    Here’s our 3-stage process for this position (5.5 hours total interviewing time) :

  • Chat with a recruiter (30 min)
  • Video interview with hiring manager (Engineering Manager) (60 minutes)
  • Pair programming interview (with two engineers) (60 minutes)
  • “Virtual onsite interview” (can be scheduled contiguous or broken up, 60 minutes each) :
  • Engineering leadership (Engineering Director and Manager of Delivery Experience)
  • System design interview (with two engineers)
  • Technology and skills interview (with two engineers)
  • Additional details

    The base salary range for this role is between $167,300 USD and $242,600 USD + bonus eligibility and equity.

    We believe in paying transparently and equitably. Your salary will ultimately be based on factors such as your experience, skills, team equity, and market data. You’ll also be eligible for unlimited PTO (which we model and encourage), work location flexibility, up to 24 weeks of parental leave, and really excellent health benefits.

    We’re only hiring those authorized to work in the United States. We do not currently sponsor immigration visas.

    We're an Equal Opportunity Employer : You'll receive consideration for employment without regard to race, sex, color, religion, sexual orientation, gender identity, national origin, protected veteran status, or on the basis of disability.

    We’ll ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please let us know if you need accommodation of any kind.

    #LI-Remote

    Salary Range

    $167,300 — $242,600 USD

    Create a job alert for this search

    Site Reliability Engineer • Remote, Remote, United States

    Related jobs
    Site Reliability Engineer

    Site Reliability Engineer

    Real Time Technologies • Remote, Remote, United States
    Remote
    Full-time
    Realtime technologies, LLC offers the most flexible cutting-edge Retail Management Solutions that encompass sales, inventory management, frontline employee management and engagement, payments, busi...Show more
    Last updated: 20 days ago • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Topstep • United States, United States, United States
    Full-time
    Are you a systems-minded engineer who thrives on building resilient infrastructure, driving operational excellence, and enabling teams to move fast with confidence? As a Staff Site Reliability Engi...Show more
    Last updated: 20 days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Prove • United States, United States, United States
    Full-time
    As the world moves to a mobile-first economy, businesses need to modernize how they acquire, engage with and enable consumers. Prove’s phone-centric identity tokenization and passive cryptographic a...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Lightfeather.io • United States, United States, United States
    Full-time
    LightFeather is seeking a Site Reliability Engineer (SRE) with strong GitLab platform expertise to support and enhance enterprise DevSecOps and collaboration environments.The ideal candidate thrive...Show more
    Last updated: 30+ days ago • Promoted
    Reliability Engineer - Carolina

    Reliability Engineer - Carolina

    Mentor Technical Group • US
    Full-time
    Quick Apply
    Essencial Functions Develop and establish strategies and methods that improve preventive / predictive maintenance across multiple locations. Lead Maintenance and Operations improvement projects across...Show more
    Last updated: 25 days ago
    Senior / Principal Site Reliability Engineer

    Senior / Principal Site Reliability Engineer

    Datacrunch • Remote, Remote, United States
    Remote
    Full-time +1
    Imagine a future where everyone has instant, low-cost access to intelligence.We’re building a fully featured European AI cloud - with everything one needs to train, experiment with, and deploy AI m...Show more
    Last updated: 29 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Cutover • Remote, Remote, United States
    Remote
    Full-time
    An inclusive work environment is an empowering one.At Cutover, we lead with empathy and enable others to succeed through curiosity, kindness, and self-expression. Location : Remote, United States.Shi...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Appomni • Remote, Remote, United States
    Full-time
    AppOmni, a leader in SaaS Security, helps customers achieve secure productivity with their applications.Security teams and owners can quickly detect and mitigate threats using unmatched depth of pr...Show more
    Last updated: 21 days ago • Promoted
    Staff Site Reliability Engineer - Platform

    Staff Site Reliability Engineer - Platform

    Ionq • Remote, Remote, United States
    Remote
    Full-time +1
    IonQ is developing the world's most powerful full-stack quantum computer based on trapped-ion technology.We are pushing past the limits of classical physics and current supercomputing technology to...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Branch Metrics • Remote, Remote, United States
    Remote
    Full-time
    At Branch, we’re transforming how brands and users interact across digital platforms.Our mobile marketing and deep linking solutions are trusted to deliver seamless experiences that increase ROI, d...Show more
    Last updated: 30+ days ago • Promoted
    Principal Site Reliability Engineer

    Principal Site Reliability Engineer

    Blue River Technology • Remote, Remote, United States
    Remote
    Full-time
    We’re Blue River, a team of innovators driven to create intelligent machinery that solves monumental problems for our customers. We empower our customers – farmers, construction crews, and foresters...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Sentinelone • Remote, Remote, United States
    Remote
    Full-time
    Please note that under Federal & FedRAMP regulations, hiring for this role is limited to US citizens only.FedRamp Staff may be subject to customer or third-party background checks up to and includi...Show more
    Last updated: 1 day ago • Promoted
    Reliability Engineer

    Reliability Engineer

    MCC • US
    Full-time
    Build Your Career with an Industry Leader.As the global leader of premium labels, Multi-Color Corporation (MCC) helps brands stand out in competitive markets and inspire positive consumer experienc...Show more
    Last updated: 1 day ago • Promoted
    Staff / Principal Site Reliability Engineer

    Staff / Principal Site Reliability Engineer

    Veza Technologies • Remote, Remote, United States
    Remote
    Full-time
    Staff / Principal Site Reliability Engineer.You'll architect scalable solutions, navigate complex technical challenges independently, and deliver results under tight deadlines in a fast-paced environ...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Sciencelogic • Remote, Remote, United States
    Remote
    Full-time
    ScienceLogic is redefining IT operations for the modern enterprise.Our AIOps platform empowers organizations to achieve Autonomic IT — where systems are self-healing, self-optimizing, and seamlessl...Show more
    Last updated: 7 days ago • Promoted
    ANF - Site Reliability Engineer - MABSM

    ANF - Site Reliability Engineer - MABSM

    Shee Atika Government Services Careers • Remote, Remote, United States
    Remote
    Full-time
    Alaska Northstar Federal is currently seeking a Site Reliability Engineer to join the team on a long-term project.This is a fully remote opportunity, but preference will be to have a candidate that...Show more
    Last updated: 30+ days ago • Promoted