Talent.com
Principal Site Reliability Engineer (SRE)
Principal Site Reliability Engineer (SRE)Instride • Remote, Remote, United States
Principal Site Reliability Engineer (SRE)

Principal Site Reliability Engineer (SRE)

Instride • Remote, Remote, United States
30+ days ago
Job type
  • Full-time
  • Remote
Job description

At InStride, people are our purpose.

We believe that investing in people is the most powerful way to drive success—for individuals and organizations alike.

As a public benefit corporation, we partner with leading employers to unlock opportunities for their employees, providing access to top-tier education programs that align with their employees’ career goals and the company’s business goals.

Our mission goes beyond skill-building; we're here to empower our partners’ employees to advance their careers, elevate their expertise, and achieve meaningful personal and professional growth.

No matter the team you’re on, our dedication to the success of our partners and their employees is what drives us. If you're passionate about making a difference and driving educational and professional advancement, InStride is the place for you.

To get a better feel for our culture, watch more here .

Candidates must be located in one of the following states to be considered eligible for employment : AZ, CA, CO, CT, FL, GA, IL, IN, KS, LA, MD, MA, MI, MO, NV, NH, NJ, NY, PA, OH, OR, TX, VA, WA, WI.

What we're looking for :

We’re looking for a Principal Site Reliability Engineer (SRE) to join InStride’s growing engineering team. This is a highly technical role for an individual contributor who thrives at the intersection of cloud architecture, automation, and reliability engineering . You will be the go-to AWS expert for complex initiatives, setting technical direction, and raising the bar for operational excellence across our platform. At InStride, every system you design, every automation you implement, and every safeguard you put in place will directly support our mission of expanding access to life-changing education for working adults around the globe.

Skills we’d love to see you show off :

  • Cloud Architecture & Strategy : Design and optimize AWS environments that balance scalability, resilience, and cost efficiency for enterprise workloads.
  • Technical Leadership & Mentorship : Serve as a trusted technical advisor, guiding engineers on best practices in Kubernetes, DevSecOps, and AWS-native design patterns.
  • Infrastructure as Code Mastery : Build reusable, version-controlled IaC libraries with AWS CDK, Terraform, or CloudFormation to standardize deployments.
  • Security & Compliance by Design : Enforce least-privilege IAM, encryption-by-default, and policy-as-code guardrails to meet security and regulatory standards.
  • Observability & Reliability Engineering : Define SLIs / SLOs, manage error budgets, and implement monitoring strategies with Prometheus, Grafana, and AWS-native tools.
  • CI / CD Excellence : Optimize automated pipelines with Harness and GitHub, enabling faster, safer, and more reliable software delivery.
  • Networking & Resilience : Architect secure, performant VPCs, load balancing, and multi-region failover strategies with AWS networking services.
  • Automation & Self-Service Enablement : Deliver developer-friendly automation and Internal Developer Portal (IDP) capabilities that empower teams to provision infrastructure without SRE intervention.

Who you are :

  • 10+ years of experience in SRE, DevOps, or Platform Engineering roles operating production AWS workloads.
  • Hands-on expertise with AWS EKS, Kubernetes networking, Helm, autoscaling frameworks (Karpenter / Cluster Autoscaler), serverless architectures, and API Gateways .
  • Proven delivery of service mesh solutions (Istio, Linkerd, or AWS App Mesh) for secure and observable service-to-service communication.
  • Proficiency with Infrastructure as Code (IaC) using AWS CDK (TypeScript preferred / Python), Terraform, or CloudFormation.
  • Strong programming and automation skills in Go, Python, or TypeScript , with additional proficiency in Bash.
  • Demonstrated experience implementing policy-as-code with OPA / Rego or similar tooling integrated into CI / CD pipelines.
  • Solid understanding of SLI / SLO / error-budget methodologies and hands-on experience with monitoring and alerting stacks (Prometheus, Grafana, CloudWatch, Groundcover).
  • Deep knowledge of AWS security best practices , including IAM policies, encryption, OS hardening, and compliance enforcement.
  • Excellent communication skills with the ability to translate reliability metrics into business impact and guide incident / post-mortem discussions.
  • Experience mentoring engineers and influencing enterprise AWS and DevOps strategies without direct management responsibilities.
  • Familiarity with Internal Developer Portals (Backstage, Port, Cortex) and self-service automation is a strong plus.
  • How you will create impact :

  • Elevate platform reliability : Design and operate multi-region, fault-tolerant systems that ensure InStride’s learning platform is always available for learners and partners.
  • Advance automation at scale : Deliver Infrastructure as Code libraries, CI / CD pipelines, and self-service capabilities that reduce operational toil and accelerate developer productivity.
  • Champion security and compliance : Implement defense-in-depth strategies, policy-as-code guardrails, and proactive monitoring to protect sensitive data and maintain trust.
  • Drive observability maturity : Define and enforce SLIs / SLOs, establish error-budget policies, and build monitoring frameworks that inform release readiness and operational decisions.
  • Enable seamless service connectivity : Deploy and manage service mesh solutions that secure, monitor, and optimize service-to-service communication across Kubernetes workloads.
  • Influence technical direction : Partner with engineering and security stakeholders to shape InStride’s AWS strategy, ensuring scalability, resilience, and cost efficiency.
  • Mentor and uplift engineers : Share expertise, lead design reviews, and guide teams toward modern DevOps and SRE practices, raising the technical bar across the organization.
  • Compensation

    At InStride, final offer amounts are dependent on multiple factors including location, depth of experience, interview performance and equity with other team members.

    We encourage you to talk with your recruiter to learn more about the total compensation and benefits available for this role.

    Compensation range :

    $165,000 - $185,000 USD

    We are looking for someone who is not only technically skilled, but also enthusiastic about making a meaningful impact. If this description resonates with you, we're excited about the possibility of having you on our team. As a skills-driven employer, we encourage you to apply if there is a skill-fit, even in the absence of years of experience.

    Don’t meet every single requirement? Studies have shown that women and people of color are less likely to apply to jobs unless they meet every single qualification. At InStride, we are dedicated to building a diverse, inclusive, and authentic workplace, so if you’re excited about this role, but your past experience doesn’t align perfectly with every qualification in the job description, we encourage you to apply anyways. You may be just the right candidate for this role!

    Benefits @ InStride

    As an organization that champions investing in people, it’s critical we walk the talk. That’s why InStride employees are eligible to enroll in 2,800+ online certificate and degree programs through our Step Forward program . Unlike traditional tuition reimbursement programs, InStride covers your tuition upfront, regardless of your course of study, degree type, or school - eligible to employees starting Day 1.

    This role is also eligible for the following benefits :

  • 401(k) plan with company match
  • Flexible vacation policy
  • Paid family leave
  • Best-in-class health care benefits
  • And more!
  • InStride Diversity and Inclusion Statement

    At InStride, we foster a culture of belonging, we support authenticity and intersectionality, and we embrace and appreciate our differences. We do this by building a diverse pipeline of talent and ensuring equitable access to opportunities, information and leadership. We celebrate diversity and are committed to creating an inclusive environment for all employees.

    If you have a disability or special need that requires accommodation, please let your recruiter know.

    Policies & Disclosure

    InStride recommends employees have their COVID vaccinations. InStride may require employees to have COVID vaccination before entering the office or attending any InStride-related even in the future. However, we do not require this at this time.

    For questions on how we use personal information of job applicants, please refer to InStride's Job Applicant Privacy Policy.

    Beware of recruiting scams. InStride does not require a financial transaction or any financial account information to be eligible for employment. If you receive a message purporting to be from InStride asking you for a financial transaction, your financial account information, or any other sensitive information, please do not respond and let us know immediately at recruiting@instride.com .

    About InStride

    InStride is a human capital management company that helps organizations retain talent, upskill employees, and fill critical workforce roles through education programs. By breaking down barriers to learning, fostering career growth aligned with organizational goals, and simplifying program management, InStride delivers lasting impact. Partnering with forward-thinking companies like Labcorp, Adidas, and SSM Health, InStride drives meaningful social and business outcomes by providing access to life-changing education. Visit instride.com or follow InStride on LinkedIn for more information and up-to-date news.

    Create a job alert for this search

    Site Reliability Engineer Sre • Remote, Remote, United States

    Related jobs
    Site Reliability Engineer

    Site Reliability Engineer

    Futurhealth • US, US, United States
    Full-time
    At FuturHealth, we're on a mission to create a product where every individual feels inspired and empowered to confidently take charge of their wellbeing. We believe in and are dedicated to offering ...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Filevine • United States, United States, United States
    Full-time
    Filevine is forging the future of legal work with cloud-based workflow tools.We have a reputation for intuitive, streamlined technology that helps professionals manage their organization and serve ...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Vgs • United States, United States, United States
    Full-time
    VGS is the world's leader in payment tokenization.Large banks, aspiring fintechs, and growing merchants embed our universal token vault into their technology stack to manage the complexities of pay...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer - Platform

    Staff Site Reliability Engineer - Platform

    Ionq • Remote, Remote, United States
    Remote
    Full-time +1
    IonQ is developing the world's most powerful full-stack quantum computer based on trapped-ion technology.We are pushing past the limits of classical physics and current supercomputing technology to...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer,Observability

    Site Reliability Engineer,Observability

    Pismo • Remote, Remote, United States
    Full-time
    The Observability Squad is responsible for maintaining the tooling used by engineers and customers to monitor Pismo services. The squad also develops guidance and standards used by engineers to crea...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Vantage • Remote, Remote, United States
    Full-time
    Vantage is a cloud cost visibility and optimization platform, alternatively known as a FinOps platform.We help companies of all sizes manage their cloud infrastructure costs : everything from indivi...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Virta Health • Remote, Remote, United States
    Remote
    Full-time
    Virta Health is on a mission to transform diabetes care and reverse the type 2 diabetes epidemic.Current treatment approaches aren’t working—over half of US adults have either type 2 diabetes or pr...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Authentic8 • Remote, Remote, United States
    Full-time
    We are a leading cybersecurity company with multiple offices (San Francisco and Redwood City, CA; Herndon, VA; and Washington, D. The world’s most at-risk organizations rely on Authentic8 to complet...Show more
    Last updated: 30+ days ago • Promoted
    Lead Site Reliability Engineer (SRE)

    Lead Site Reliability Engineer (SRE)

    Mattermost • United States, United States, United States
    Full-time
    At Mattermost, we build the #1 collaborative workflow solution for defense, intelligence, security, and critical infrastructure organizations. Trusted by governments, financial institutions, and tec...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Amido • Remote, Remote, United States
    Full-time
    Ensono Digital is an award-winning, cloud native digital consultancy.Over the past decade, we have been trusted by some of the UK’s leading organisations to tackle their most complex digital transf...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Offchain Labs • United States, United States, United States
    Full-time
    At Offchain Labs, we are not just building products — we’re leading a movement.We are committed to creating a decentralized, secure, and transparent future through blockchain technology.Our mission...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Patreon • United States, United States, United States
    Full-time
    Patreon is the best place for creators to build exclusive content and community for their fans.We enable creators (podcasters, writers, musicians, illustrators, etc) to connect with their fans dire...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Stability Ai • United States, United States, United States
    Full-time
    Stability AI’s Engineering Operations team is looking for a Senior Site Reliability Engineer (SRE) to join our growing team and play a pivotal role in improving and shaping our cloud infrastructure...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Euna Solutions • United States, United States, United States
    Remote
    Full-time
    We’re seeking a highly skilled.Senior Site Reliability Engineer (SRE).SRE / DevOps expertise but also a strong foundation in. If you’ve built systems from the ground up, understand how code behaves in...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer I

    Senior Site Reliability Engineer I

    Invoca • Remote, Remote, United States
    Remote
    Full-time
    Invoca is the leading AI-powered conversation intelligence platform that enables marketing, sales, customer experience, and contact center teams to understand and act on the information within ever...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Onepay • Remote, Remote, United States
    Remote
    Full-time
    OnePay is a consumer financial services app with an exceedingly simple mission : to help people achieve financial progress. Tens of millions of Americans today are unbanked or underbanked, meaning th...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    Veeam Software • Remote, Remote, United States
    Remote
    Full-time
    Veeam, the #1 global market leader in data resilience, believes businesses should control all their data whenever and wherever they need it. Veeam provides data resilience through data backup, data ...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Sciencelogic • Remote, Remote, United States
    Remote
    Full-time
    ScienceLogic is redefining IT operations for the modern enterprise.Our AIOps platform empowers organizations to achieve Autonomic IT — where systems are self-healing, self-optimizing, and seamlessl...Show more
    Last updated: 4 days ago • Promoted