Lead Site Reliability (DevOps) EngineerRoberts Recruiting, LLC • Boston, MA, United States

Lead Site Reliability (DevOps) Engineer

Roberts Recruiting, LLC • Boston, MA, United States

5 hours ago

Job type

Full-time

Job description

Lead Site Reliability Engineer

We’re looking for a top‑notch, hands‑on SRE to lead our small and talented infrastructure engineering team and help us elevate our game when it comes to designing, building and operating high‑performance and highly‑available systems.

We’re backed by Insight Venture Partners and Iconiq Capital, we’re on a path to $1B in 2019, and we’ll get there — even more surely if you come help us.

Every engineer is responsible for the software they build, and SREs play a critical part in providing the tools, practices, and expertise to support them succeed.

Our production systems are hosted in AWS datacenters running a large Ruby on Rails web application and a handful of smaller services in Ruby, Node.js, and Java. We currently deploy 3–5 times a day. Our systems are stable and fire drills are rare. Technologies we’re currently using include :

Amazon Web Services (EC2, ELB, S3, RDS, ElastiCache) and Ubuntu Linux
Postgres, Redis, Memcached, ElasticSearch
Chef, ServerSpec, Terraform, NewRelic, DataDog, Sumo Logic and Test Kitchen

In this mission‑critical role, you would :

Design, build, and maintain the core infrastructure of our product

Actively manage the backlog for our infrastructure team and work closely with other SREs on the team to provide coaching and mentorship

Help us increase developer productivity and get to true continuous delivery

Develop operational and security standards and champion operational excellence and secure coding practices

Partner with engineering teams closely to educate and consult

Participate in solution design for new features, products, systems and tooling

Debug complex problems across the whole stack

Continually monitor application / system performance and costs, generate actionable insights and either implement or advocate for them

Participate in on‑call rotations, along with every member of the engineering team

Ruthlessly eliminate repetitive manual tasks and recurring errors

Ensure we are always employing best‑of‑breed tooling for all our infrastructure and automation needs

Collaboratively plot course for the maturing and growth of our infrastructure

Participate (and sometimes run point) in handling production incidents

Work closely with engineering teams to conduct root cause analysis for production incidents, and evolve infrastructure and tooling

This role might be that rare opportunity if you :

Thrive in a highly collaborative, no red‑tape, rapid‑growth environment

Love building tooling and infrastructure to help developers be more productive

Love eliminating repetitive manual tasks through automation

Have a healthy appreciation of what it means to work in production

Have solid Unix command line and systems chops

Have experience with substantial, distributed SaaS or eCommerce systems

Can point to a solid track record of success leading small‑to‑medium infrastructure teams

Have vision and well‑informed opinions about how to build infrastructure for a high‑growth, technology‑driven company that’s headed towards the $1B mark

#J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • Boston, MA, United States

Related jobs

Site Reliability Engineer

LogRocket • Boston, MA, United States

Full-time

Site Reliability Engineer (SRE) - Platform Infrastructure team (100% Remote - USA).Founded in 2016, LogRocket's goal is to make every experience on the web as perfect as possible.We solve a huge ch...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Red Hat • Boston, MA, United States

Full-time +1

Join to apply for the Site Reliability Engineer role at Red Hat.Red Hat is looking for a Platform Engineer to join its Platform Engineering team! In this role, you will help architect, implement, i...Show more

Last updated: 13 days ago • Promoted

Site Reliability Engineer

Cimulate, Inc. • Boston, MA, United States

Full-time

In this pivotal role, you’ll own the reliability, availability, and performance of our SaaS production environment—monitoring critical systems, managing deployments, and ensuring seamless operation...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Wabbisoft • Boston, MA, United States

Full-time

Boston, MA or Remote / / Full-time Position.Are you interested in helping companies transform the way they think about security as part of their software development pipeline? If “Yes!,” then keep re...Show more

Last updated: 30+ days ago • Promoted

OpenShift / Kubernetes Site Reliability Engineer

Ford Motor Company • Boston, MA, United States

Full-time

This Kubernetes Site Reliability Engineer position will design and provision infrastructure supporting cloud native applications alongside a geographically distributed team.Emphasis will be on cont...Show more

Last updated: 10 days ago • Promoted

Lead Reliability Engineer

Arcadis • Boston, MA, United States

Full-time +1

Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features.Arcadis is the world's leading company delivering sustainable design, engineering, and consultancy sol...Show more

Last updated: 13 days ago • Promoted

Lead Site Reliability Engineer (SRE)

EPAM Systems • Boston, MA, United States

Full-time

At EPAM, we’re not just building software — we’re engineering excellence.Lead Site Reliability Engineer (SRE).This role is ideal for someone who thrives in fast-paced financial systems, has a passi...Show more

Last updated: 13 days ago • Promoted

Staff Site Reliability Engineer - Observability

Hispanic Alliance for Career Enhancement • Boston, MA, United States

Full-time

At CVS Health, we’re building a world of health around every consumer and surrounding ourselves with dedicated colleagues who are passionate about transforming health care.As the nation’s leading h...Show more

Last updated: 30+ days ago • Promoted

DevOps and Site Reliability Engineer

Devopshunt • Boston, MA, United States

Full-time

Boston Red Sox and Fenway Sports Management.Members of the Baseball Systems team at the Boston Red Sox are focused on designing, building, and refining the software and data pipelines used within B...Show more

Last updated: 3 days ago • Promoted

Site Reliability Engineer

Cimulate AI • Boston, MA, United States

Full-time

In this pivotal role, you will own the reliability, availability, and performance of our SaaS production environment—monitoring critical systems, managing deployments, and ensuring seamless operati...Show more

Last updated: 13 days ago • Promoted

Site Reliability Engineer (SRE)

SS&C Technologies • Boston, MA, United States

Full-time

SS&C Technologies is a global investment and financial services software provider, headquartered in Windsor, Connecticut, and supporting more than 28,000 employees across 35 countries.It specialize...Show more

Last updated: 10 days ago • Promoted

Site Reliability Engineer

Canonical • Boston, MA, United States

Full-time

Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiat...Show more

Last updated: 30+ days ago • Promoted

Desktop Engineer Lead

TEKsystems • Andover, MA, United States

Full-time

Desktop Engineer Lead •Location : • Andover, MA.Category : • Technical Support Manager / Supervisor.About the Role We are seeking an experienced Desktop Engineer Lead to join our IT team in Andover, MA...Show more

Last updated: 12 days ago • Promoted

Full Stack Engineer

forREAL • Danvers, MA, United States

Full-time

REAL is a modern platform that simplifies the leasing experience for both tenants and landlords.Tenants can browse listings, explore neighborhoods, take 3D tours, and complete the application proce...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

LogRocket, Inc • Boston, MA, United States

Full-time

Founded in 2016, LogRocket's goal is to make every experience on the web as perfect as possible.We're solving a huge challenge for product managers and developers - understanding the user experienc...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer (SRE) - Cloud / DevOps Engineer-

Lumen • Boston, MA, United States

Full-time

We are igniting business growth by connecting people, data and applications – quickly, securely, and effortlessly.Together, we are building a culture and company from the people up – committed to t...Show more

Last updated: 5 days ago • Promoted

Site Reliability Engineer

Digital Realty • Boston, MA, United States

Full-time

Position Title : Site Reliability Engineer, Interconnection Service and Network Delivery.Location : Hybrid : Austin, Dallas, Boston, Ashburn, Atlanta, London, or Amsterdam. In this role, you will be re...Show more

Last updated: 13 days ago • Promoted

Site Reliability Engineer III - AWM

JPMorgan Chase & Co. • Boston, MA, United States

Full-time

We have an exciting and rewarding opportunity for you to take your software engineering career to the next level.As a Software Engineer III at JPMorganChase within the Asset and Wealth Management A...Show more

Last updated: 23 days ago • Promoted