Lead Site Reliability Engineer
We’re looking for a top‑notch, hands‑on SRE to lead our small and talented infrastructure engineering team and help us elevate our game when it comes to designing, building and operating high‑performance and highly‑available systems.
We’re backed by Insight Venture Partners and Iconiq Capital, we’re on a path to $1B in 2019, and we’ll get there — even more surely if you come help us.
Every engineer is responsible for the software they build, and SREs play a critical part in providing the tools, practices, and expertise to support them succeed.
Our production systems are hosted in AWS datacenters running a large Ruby on Rails web application and a handful of smaller services in Ruby, Node.js, and Java. We currently deploy 3–5 times a day. Our systems are stable and fire drills are rare. Technologies we’re currently using include :
- Amazon Web Services (EC2, ELB, S3, RDS, ElastiCache) and Ubuntu Linux
- Postgres, Redis, Memcached, ElasticSearch
- Chef, ServerSpec, Terraform, NewRelic, DataDog, Sumo Logic and Test Kitchen
In this mission‑critical role, you would :
Design, build, and maintain the core infrastructure of our productActively manage the backlog for our infrastructure team and work closely with other SREs on the team to provide coaching and mentorshipHelp us increase developer productivity and get to true continuous deliveryDevelop operational and security standards and champion operational excellence and secure coding practicesPartner with engineering teams closely to educate and consultParticipate in solution design for new features, products, systems and toolingDebug complex problems across the whole stackContinually monitor application / system performance and costs, generate actionable insights and either implement or advocate for themParticipate in on‑call rotations, along with every member of the engineering teamRuthlessly eliminate repetitive manual tasks and recurring errorsEnsure we are always employing best‑of‑breed tooling for all our infrastructure and automation needsCollaboratively plot course for the maturing and growth of our infrastructureParticipate (and sometimes run point) in handling production incidentsWork closely with engineering teams to conduct root cause analysis for production incidents, and evolve infrastructure and toolingThis role might be that rare opportunity if you :
Thrive in a highly collaborative, no red‑tape, rapid‑growth environmentLove building tooling and infrastructure to help developers be more productiveLove eliminating repetitive manual tasks through automationHave a healthy appreciation of what it means to work in productionHave solid Unix command line and systems chopsHave experience with substantial, distributed SaaS or eCommerce systemsCan point to a solid track record of success leading small‑to‑medium infrastructure teamsHave vision and well‑informed opinions about how to build infrastructure for a high‑growth, technology‑driven company that’s headed towards the $1B mark#J-18808-Ljbffr