Staff Site Reliability EngineerAttentive • United States, United States, United States

Staff Site Reliability Engineer

Attentive • United States, United States, United States

30+ days ago

Job type

Full-time

Job description

Attentive® is the AI-powered mobile marketing platform transforming the way brands personalize consumer engagement. Attentive enables marketers to craft tailored journeys for every subscriber, driving higher recurring revenue and maximizing campaign performance. Activating real-time data from multiple channels and advanced AI, the platform personalizes content, tone, and timing to deliver 1 : 1 messages that truly resonate.

With a top-rated customer success team recognized on G2, Attentive partners with marketers to provide strategic guidance and optimize SMS and email campaigns. Trusted by leading global brands like Neiman Marcus, Samsung, Wayfair, and Dyson, Attentive ensures enterprise-grade compliance and deliverability, supporting trillions of interactions across more than 70 industries. To learn more or request a demo, visit www.attentive.com or follow us on LinkedIn , X (formerly Twitter), or Instagram .

Attentive’s growth has been recognized by Deloitte’s Fast 500 , Linkedin’s Top Startups and Forbes Cloud 100 all thanks to the hard work from our global employees!

About the Role

Our Platform Infrastructure team is the backbone of everything we do at Attentive, providing a resilient and cost-effective platform that seamlessly handles billions of events from over 100 million customers daily. We own everything from compute, persistence, and networking to observability and deployments. Joining our team offers a high-growth career opportunity to collaborate with some of the world’s most talented engineers in a high-performance, high-impact culture.

As part of the Infrastructure and Platform organization, the Production Engineering Team is focused on delivering a fast and reliable platform that empowers Attentive engineers to deliver solutions quickly and safely. We build scalable systems that automate routine tasks so we can focus on other impactful efforts. Reliability, scalability, and security are our areas of expertise. We focus on release, observability, and cost optimization. Our mission is to create robust platforms and tools that allow stakeholders to concentrate on delivering exceptional products.

As a Staff Engineer, you will take a strategic role in designing and implementing solutions that enhance the reliability and scalability of our systems, while mentoring others and influencing technical roadmaps across the organization.

What You'll Accomplish

Design and Deliver High-Impact Solutions : Design and implement systems that enhance reliability, observability, traceability, and incident management, ensuring the platform scales effectively
Lead Strategic Initiatives : Take ownership of cross-team collaborations and drive impactful projects by providing technical leadership and guidance
Partner Across Teams : Collaborate with engineers from AI / ML, Data, Platform, and Product teams to develop best-in-class services
Partner with engineers from AI / ML, Data, Platform, Product, and other groups to deliver best-in-class services
Establish Standards and Best Practices : Define and enforce production standards, processes, and tools to ensure operational excellence
Champion Reliability Goals : Advocate for and implement SLIs, SLOs, and other reliability-focused metrics across the engineering organization
Mentorship and Knowledge Sharing : Guide and mentor team members, fostering technical growth and helping to develop the next generation of engineering leaders
Innovate and Inspire : Drive continuous improvement by bringing creative ideas and challenging the status quo

Your Expertise

7+ years of experience in Production Engineering, Backend Engineering, SRE, DevOps or similar role

Proficient Problem-Solver : Strong coding ability in at least one language (e.g., Golang, Python, Java, Typescript) with the capability to solve complex issues through code

Track Record of Success : Demonstrated experience delivering medium to large-scale projects that drive meaningful improvements in platform reliability and scalability

Reliability Expertise : Deep understanding of production reliability concepts, including SLIs, SLOs, and incident management

Strong Communicator : Excellent verbal and written communication skills with the ability to influence and collaborate across technical and non-technical teams

Fast-Paced Experience : Familiarity with working in dynamic, reliability-focused production environments (preferred)

What We Use

Our infrastructure runs primarily in Kubernetes hosted in AWS’s EKS

Infrastructure tooling includes Istio, Datadog, Terraform, CloudFlare, and Helm

Our backend is Java / Spring Boot microservices, built with Gradle, coupled with things like DynamoDB, Kinesis, AirFlow, Postgres, Planetscale, and Redis, hosted via AWS

Our frontend is built with React and TypeScript, and uses best practices like GraphQL, Storybook, Radix UI, Vite, esbuild, and Playwright

Our automation is driven by custom and open source machine learning models, lots of data and built with Python, Metaflow, HuggingFace 🤗, PyTorch, TensorFlow, and Pandas

You'll get competitive perks and benefits , from health & wellness to equity, to help you bring your best self to work.

For US based applicants :

The US base salary range for this full-time position is $156,000 - $240,000 annually + equity + benefits

Equity is a substantial part of the total compensation package

Our salary ranges are determined by role, level and location

#LI-EF1

Attentive Company Values

Default to Action - Move swiftly and with purpose

Be One Unstoppable Team - Rally as each other’s champions

Champion the Customer - Our success is defined by our customers' success

Act Like an Owner - Take responsibility for Attentive’s success

Learn more about AWAKE , Attentive’s collective of employee resource groups.

If you do not meet all the requirements listed here, we still encourage you to apply! No job description is perfect, and we may also have another opportunity that closely matches your skills and experience.

At Attentive, we know that our Company's strength lies in the diversity of our employees. Attentive is an Equal Opportunity Employer and we welcome applicants from all backgrounds. Our policy is to provide equal employment opportunities for all employees, applicants and covered individuals regardless of protected characteristics. We prioritize and maintain a fair, inclusive and equitable workplace free from discrimination, harassment, and retaliation. Attentive is also committed to providing reasonable accommodations for candidates with disabilities. If you need any assistance or reasonable accommodations, please let your recruiter know.

Create a job alert for this search

Site Reliability Engineer • United States, United States, United States

Related jobs

Site Reliability Engineer

Futurhealth • US, US, United States

Full-time

At FuturHealth, we're on a mission to create a product where every individual feels inspired and empowered to confidently take charge of their wellbeing. We believe in and are dedicated to offering ...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

Filevine • United States, United States, United States

Full-time

Filevine is forging the future of legal work with cloud-based workflow tools.We have a reputation for intuitive, streamlined technology that helps professionals manage their organization and serve ...Show more

Last updated: 30+ days ago • Promoted

Staff Site Reliability Engineer

Vgs • United States, United States, United States

Full-time

VGS is the world's leader in payment tokenization.Large banks, aspiring fintechs, and growing merchants embed our universal token vault into their technology stack to manage the complexities of pay...Show more

Last updated: 30+ days ago • Promoted

Staff Site Reliability Engineer - Platform

Ionq • Remote, Remote, United States

Remote

Full-time +1

IonQ is developing the world's most powerful full-stack quantum computer based on trapped-ion technology.We are pushing past the limits of classical physics and current supercomputing technology to...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer,Observability

Pismo • Remote, Remote, United States

Full-time

The Observability Squad is responsible for maintaining the tooling used by engineers and customers to monitor Pismo services. The squad also develops guidance and standards used by engineers to crea...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

Vantage • Remote, Remote, United States

Full-time

Vantage is a cloud cost visibility and optimization platform, alternatively known as a FinOps platform.We help companies of all sizes manage their cloud infrastructure costs : everything from indivi...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

Virta Health • Remote, Remote, United States

Remote

Full-time

Virta Health is on a mission to transform diabetes care and reverse the type 2 diabetes epidemic.Current treatment approaches aren’t working—over half of US adults have either type 2 diabetes or pr...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer, Devops

Pismo • Remote, Remote, United States

Full-time

The DevOps Squad is responsible for developing and managing the tooling used by our squads to release changes in our fast moving environment. They manage the various tools such as source code reposi...Show more

Last updated: 30+ days ago • Promoted

Staff Site Reliability Engineer - FedRAMP

Tenable • Remote, Remote, United States

Full-time

Tenable® is the Exposure Management company.Tenable to understand and reduce cyber risk.Our global employees support 65 percent of the Fortune 500, 45 percent of the Global 2000, and large governme...Show more

Last updated: 30+ days ago • Promoted

Lead Site Reliability Engineer (SRE)

Mattermost • United States, United States, United States

Full-time

At Mattermost, we build the #1 collaborative workflow solution for defense, intelligence, security, and critical infrastructure organizations. Trusted by governments, financial institutions, and tec...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Offchain Labs • United States, United States, United States

Full-time

At Offchain Labs, we are not just building products — we’re leading a movement.We are committed to creating a decentralized, secure, and transparent future through blockchain technology.Our mission...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Patreon • United States, United States, United States

Full-time

Patreon is the best place for creators to build exclusive content and community for their fans.We enable creators (podcasters, writers, musicians, illustrators, etc) to connect with their fans dire...Show more

Last updated: 30+ days ago • Promoted

Staff Site Reliability Engineer

Veeam Software • Remote, Remote, United States

Remote

Full-time

Veeam, the #1 global market leader in data resilience, believes businesses should control all their data whenever and wherever they need it. Veeam provides data resilience through data backup, data ...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

Stability Ai • United States, United States, United States

Full-time

Stability AI’s Engineering Operations team is looking for a Senior Site Reliability Engineer (SRE) to join our growing team and play a pivotal role in improving and shaping our cloud infrastructure...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Onepay • Remote, Remote, United States

Remote

Full-time

OnePay is a consumer financial services app with an exceedingly simple mission : to help people achieve financial progress. Tens of millions of Americans today are unbanked or underbanked, meaning th...Show more

Last updated: 30+ days ago • Promoted

Principal Site Reliability Engineer (SRE)

Instride • Remote, Remote, United States

Remote

Full-time

At InStride, people are our purpose.We believe that investing in people is the most powerful way to drive success—for individuals and organizations alike. As a public benefit corporation, we partner...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

Sciencelogic • Remote, Remote, United States

Remote

Full-time

ScienceLogic is redefining IT operations for the modern enterprise.Our AIOps platform empowers organizations to achieve Autonomic IT — where systems are self-healing, self-optimizing, and seamlessl...Show more

Last updated: 4 days ago • Promoted

Senior Site Reliability Engineer

Euna Solutions • United States, United States, United States

Remote

Full-time

We’re seeking a highly skilled.Senior Site Reliability Engineer (SRE).SRE / DevOps expertise but also a strong foundation in. If you’ve built systems from the ground up, understand how code behaves in...Show more

Last updated: 30+ days ago • Promoted