Talent.com
Principal Site Reliability Engineer
Principal Site Reliability EngineeriSpot • Bellevue, WA, US
No se aceptan más aplicaciones
Principal Site Reliability Engineer

Principal Site Reliability Engineer

iSpot • Bellevue, WA, US
Hace 8 días
Tipo de contrato
  • A tiempo completo
  • A tiempo parcial
  • Indefinido
Descripción del trabajo

Job Description

Job Description

Immigration / Work Authorization Notice : At this time, iSpot does not provide visa sponsorship or immigration support for this role. Applicants must already be authorized to work in the United States on a full-time, permanent basis without the need for current or future sponsorship.

iSpot competes for the best talent. Our compensation packages consist of salary and equity in one of Seattle's hottest start-ups, as well as other standard benefits. Most importantly, we provide a really interesting working experience, and the chance to contribute to the success of something great.

What You'll Be Part Of :

iSpot.tv is changing how brands, agencies, and networks measure and assess the impact of TV advertising. We deal with BIG data, operating mainly in AWS with multiple Kubernetes clusters and thousands of servers. We are looking for an experienced SRE leader with the skills and passion to make a significant impact on our ecosystem. You will have a wide array of projects to tackle, with ample opportunities for growth.

You will be a key member of our SRE leadership team, focused on empowering developers to build, test, and deploy applications faster and more efficiently. You will both lead the team and remain hands-on in designing, building, and maintaining the tools, platforms, and processes that improve our engineering teams' productivity and streamline the software development lifecycle. Your work will directly impact developer happiness and the speed at which we can deliver innovative features to our customers.

Responsibilities :

We are seeking a seasoned and strategic Lead / Principal Site Reliability Engineer to drive the reliability, scalability, and performance of our core production systems while significantly enhancing the internal developer experience. This role sits at the intersection of operations and development, requiring deep technical expertise, strong leadership, and a passion for optimizing the entire software development lifecycle (SDLC).

Our team consists of senior engineers who work together with minimal supervision to attain those goals. Candidates must possess deep operational experience with AWS and Kubernetes to support teams utilizing these systems. You will lead the technical direction of the team while remaining a key individual contributor. You will be responsible for creating a culture of engineering excellence, designing self-service platforms, and fostering alignment across all engineering teams to accelerate product delivery and maintain world-class service stability.The key responsibilities are :

  • System Reliability and Operations (SRE Focus)
  • Platform Design and Management : Architect, build, and maintain scalable, highly available, and reliable cloud infrastructure in AWS leveraging modern container orchestration technologies.
  • Data Pipeline Reliability : Serve as the reliability and cost optimization expert for high-volume, data-intensive workloads. Focus on optimizing and ensuring the stability of distributed data processing engines, specifically Apache Spark and related ecosystems (e.g., EMR, Databricks, Glue).
  • Observability and Monitoring : Establish comprehensive observability practices by defining SLIs / SLOs, implementing advanced monitoring, alerting, and logging solutions to quickly identify and resolve system anomalies.
  • Automation : Drive automation across all operational aspects, including infrastructure provisioning (Terraform), scaling, deployment, and incident response, minimizing toil and manual effort.
  • Incident Management : Lead and participate in the incident response lifecycle, performing thorough post-mortems to derive actionable insights and implement preventative measures to improve system resilience.
  • Developer Experience and Productivity (DevEx Focus)
  • Platform Strategy : Design, implement, and champion self-service tools, internal developer portals, and services that empower engineering teams to manage their infrastructure and deployments independently and efficiently.
  • CI / CD Optimization : Own and continuously improve the CI / CD pipelines, reducing build times, streamlining deployment workflows, and integrating best practices for testing, security (Shift Left), and code quality. Maintain and improve our container orchestration and deployment tools, leveraging Kubernetes, Helm, and ArgoCD to create seamless developer workflows.
  • KPIs : Develop, implement, and maintain a set of key performance indicators (KPIs) to measure and improve the developer experience across all of Engineering.
  • Mentorship and Documentation : Guide and mentor senior engineers, promoting SRE / DevEx principles. Develop clear, comprehensive documentation and tutorials to ensure seamless adoption of new tools and platforms.
  • Cost and Efficiency : Strategically identify and implement opportunities for cloud cost optimization and resource efficiency without compromising reliability or performance.

III. Strategic Leadership and Cross-Team Alignment

  • Architecting the Roadmap : Define, champion, and communicate the long-term technical roadmap for the SRE and DevEx platforms, balancing immediate operational needs with strategic, future-state goals.
  • Driving Cross-Team Alignment : Act as a critical liaison between infrastructure, security, and product development teams. Proactively drive cross-team alignment on architectural standards, tooling choices, and development workflows to ensure consistency and shared accountability for system health.
  • Bottleneck Identification and Mitigation : Systematically identify engineering bottlenecks, friction points, and points of organizational toil within the SDLC. Implement targeted solutions—whether technical, process-based, or organizational—to mitigate these constraints and enhance overall engineering velocity.
  • Planning and Execution : Collaborate with engineering leadership to transform the strategic roadmap into actionable, prioritized plans, securing cross-functional buy-in and resources for successful execution.
  • Qualifications and Education Requirements :

  • Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
  • 10+ years of relevant experience in software engineering, cloud architecture, and / or Site Reliability Engineering, with at least 3 years in a leadership or lead contributor role.
  • Deep expertise of AWS, including EKS, ECR, RDS, SQS / SNS, VPC, MWAA and S3.
  • Strong proficiency in Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation).
  • Specialized experience in optimizing large-scale data platforms, specifically with Apache Spark. Proven ability to profile, troubleshoot, and tune Spark jobs for performance, cost, and reliability.
  • 5+ years of experience with Kubernetes and containerization in general, including associated tools (kubectl, Helm, ArgoCD).
  • Strong knowledge of AWS cost optimization.
  • TCP / IP networking, including routing and AWS security groups.
  • Excellent knowledge of CI / CD concepts and experience developing associated pipelines in CircleCI.
  • Proficient in high-level scripting languages, including shell scripting, Python, and / or JavaScript.
  • Experience with OTel and monitoring tools such as Splunk or DataDog. Experience with native AI observability tools is a plus.
  • Experience with evaluating and rolling out GenAI tools for improving developer efficiency.
  • Excellent communication, collaboration, and stakeholder management skills, with proven experience driving technical initiatives across multiple teams.
  • Experience with researching and selecting new / modern developer toolsets and assisting teams in adopting them including vendor assessments, security assessments and procurement process.
  • Experience in Ad-Tech or "BIG Data" processing organization is highly preferred
  • Target cash compensation range : $163,620 - $212,710 USD Annually

    We are committed to providing competitive, market-informed compensation. The cash compensation above includes base salary, variable commission for employees in eligible roles, and annual bonus targets for eligible roles. In addition to cash compensation, all full time iSpotters are eligible to participate in iSpot's equity plan to receive stock options. Non-exempt roles will also be eligible for (pre-approved) overtime pay. Individual compensation packages are influenced by different factors unique to each candidate, including their skills, experience, qualifications and other job-related reasons.

    For more information on total rewards package, go HERE

    Hybrid & Flexible Workplace Policy

    iSpot supports a hybrid and flexible workplace. Depending on location and work responsibilities, employees may be designated as full-time or part-time office-based or a fully remote employee. A hybrid work schedule indicates that you work in the office some days and work from home other days. The best hybrid workplaces allow for flexibility while also encouraging consistency.

    Those local or living in surrounding areas to one of our offices (Bellevue, WA; El Segundo, CA; New York, NY) will work a hybrid schedule, coming into their local office 1-3 days a week. While those in a role, not office-based and located further away from our offices, will work a fully remote schedule. If you have questions regarding exact details of our hybrid & flexible workplace policy, please let your recruiter know and they will discuss with you further.

    #LI-Remote

    If you don't feel you met every single requirement for the role, don't rule yourself out. Please apply anyway!

    iSpot is an equal opportunity employer. All applicants will receive consideration for employment without regard to race, ethnicity, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. If you need assistance and / or a reasonable accommodation due to a disability during the application or the recruiting process, please contact our HR team.

    California Residents applying for positions at iSpot can access our California Consumer Privacy Act here.

    Crear una alerta de empleo para esta búsqueda

    Site Reliability Engineer • Bellevue, WA, US

    Ofertas relacionadas
    Site Reliability Engineer

    Site Reliability Engineer

    VirtualVocations • Renton, Washington, United States
    A tiempo completo
    A company is looking for a Site Reliability Engineer to join a Cloud Services team in a remote role.Key Responsibilities Serve as a cloud SME for clients, providing expertise in design, architect...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Principal Solutions Engineer

    Principal Solutions Engineer

    VirtualVocations • Renton, Washington, United States
    A tiempo completo
    A company is looking for a Principal Solutions Engineer.Key Responsibilities Develop detailed Solution Briefs based on business requirements Identify and understand analytics, advertising, and m...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    VirtualVocations • Seattle, Washington, United States
    A tiempo completo
    A company is looking for a Senior Site Reliability Engineer.Key Responsibilities Maintain scalable, secure, and reliable cloud services to ensure system operations within Service Level Objectives...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Site Reliability Engineering Manager

    Site Reliability Engineering Manager

    VirtualVocations • Renton, Washington, United States
    A tiempo completo
    A company is looking for a Site Reliability Engineering Manager to lead their Site Reliability Engineering team.Key Responsibilities Lead and mentor a team of SREs, promoting growth and collabora...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Platform / Site Reliability Engineer

    Platform / Site Reliability Engineer

    Axiom Software Solutions Limited • Seattle, WA, US
    A tiempo completo
    Quick Apply
    We are looking for a skilled Platform Engineer / SRE to design, implement, and maintain our cloud infrastructure and platforms. The ideal candidate will have a strong background in Kubernetes admini...Mostrar más
    Última actualización: hace más de 30 días
    Principal Engineer

    Principal Engineer

    VirtualVocations • Renton, Washington, United States
    A tiempo completo
    A company is looking for an Engineer Principal to lead optimization of drug substance processes in pharmaceutical manufacturing. Key Responsibilities Lead optimization of drug substance processes ...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Lead Engineer

    Lead Engineer

    VirtualVocations • Renton, Washington, United States
    A tiempo completo
    A company is looking for a Lead Engineer to oversee delivery across their core product stack and mentor an engineering team. Key Responsibilities Lead a team of 4-6 engineers in full-stack develop...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Systems Project Engineer

    Systems Project Engineer

    VirtualVocations • Renton, Washington, United States
    A tiempo completo
    A company is looking for a Systems Project Engineer - Remote.Key Responsibilities Implement and deliver on project scopes while collaborating with the project management team Drive best practice...Mostrar más
    Última actualización: hace 17 días • Oferta promocionada
    Forward Deployed Engineer

    Forward Deployed Engineer

    VirtualVocations • Renton, Washington, United States
    A tiempo completo
    A company is looking for a Forward Deployed Engineer.Key Responsibilities Partner with the Sales team to deliver tailored solutions during active cycles Lead technical onboarding and integration...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Deployment Engineer

    Deployment Engineer

    VirtualVocations • Renton, Washington, United States
    A tiempo completo
    A company is looking for a Forward Deployment Engineer to work directly with customers and enhance deployment processes.Key Responsibilities Deploy and integrate Revic's AI platform with customer...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Senior Technical Operations Engineer

    Senior Technical Operations Engineer

    VirtualVocations • Renton, Washington, United States
    A tiempo completo
    A company is looking for a Senior Technical Operations Engineer focused on Agentic AI solutions.Key Responsibilities Lead the design, development, and deployment of AI applications within the cor...Mostrar más
    Última actualización: hace 1 día • Oferta promocionada
    Travel - CT Technologist

    Travel - CT Technologist

    Titan Medical Group • Port Hadlock, WA, US
    A tiempo completo
    WHEN YOU WORK FOR US, WE WORK FOR YOU\n.With Titan Medical, you gain access to thousands of travel nursing and allied health jobs across the country. You also get unmatched service.From the moment y...Mostrar más
    Última actualización: hace 1 hora • Oferta promocionada • Nueva oferta
    Lead Customer Solutions Engineer

    Lead Customer Solutions Engineer

    VirtualVocations • Renton, Washington, United States
    A tiempo completo
    A company is looking for a Lead Customer Solutions Engineer, AP1000.Key Responsibilities : Develop and leverage key customer relationships to create proposals that meet customer needs Accountable...Mostrar más
    Última actualización: hace 3 días • Oferta promocionada
    Site Reliability Manager

    Site Reliability Manager

    VirtualVocations • Seattle, Washington, United States
    A tiempo completo
    A company is looking for a Manager, SRE to lead engineering teams in building a reliable and secure identity platform.Key Responsibilities Lead and manage teams responsible for cloud infrastructu...Mostrar más
    Última actualización: hace 2 días • Oferta promocionada
    Staff Systems Reliability Engineer

    Staff Systems Reliability Engineer

    VirtualVocations • Renton, Washington, United States
    A tiempo completo
    A company is looking for a Staff Systems Reliability Engineer.Key Responsibilities Design and implement scalable, fault-tolerant AWS-based infrastructure Develop and maintain CI / CD pipelines and...Mostrar más
    Última actualización: hace 3 días • Oferta promocionada
    Principal Systems Engineer - SEIT

    Principal Systems Engineer - SEIT

    3MD Inc. • Redmond, WA, US
    A tiempo completo
    Additionally, eligible hourly / non-exempt and exempt employees accrue up to 112 hours of PTO based on years of service and may annually take up to 8 hours of paid volunteer time.Additional paid sick...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Principal Software Engineer

    Principal Software Engineer

    VirtualVocations • Renton, Washington, United States
    A tiempo completo
    A company is looking for a Principal Software Engineer, Gen AI.Key Responsibilities Implement complex software systems integrating Generative AI capabilities Lead cross-functional collaboration ...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Senior Site Reliability Engineer - Infrastructure

    Senior Site Reliability Engineer - Infrastructure

    The Trade Desk • Seattle, WA, United States
    A tiempo completo
    The Trade Desk is changing the way global brands and their agencies advertise to audiences around the world.How? With a media buying platform that helps brands deliver a more insightful and relevan...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada