Site Reliability EngineerP2P • San Francisco, CA, United States

Site Reliability Engineer

P2P • San Francisco, CA, United States

30+ days ago

Job type

Full-time

Job description

Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers the powerful APIs, SDKs, and tools necessary to build and scale onchain apps and rollups.

Our infrastructure powers 70% of the top web3 teams, 90%+ of web2 companies building in web3 and 100+ million end users. Our customers include top web3 brands like Polymarket, OpenSea, Circle, WorldCoin, as well as major global brands like Shopify and Adobe.

The Alchemy team draws from decades of deep expertise in massively scalable infrastructure, AI, and blockchain from leadership roles at leading companies and universities like Google, Microsoft, Facebook, Stanford, and MIT.

We're backed by the world's leading VCs and institutions, including : Lightspeed, Silver Lake, a16z, Coatue, Pantera, Addition, Stanford University, Coinbase, and Charles Schwab, among others.

The Role

As an engineer in the Infrastructure department at Alchemy, you will collaborate with our engineering team to design, deploy, and continuously improve the infrastructure supporting our globally used developer platform. Your focus will be on enhancing developer productivity and ensuring product reliability as we scale.

The Infrastructure team’s mission is to provide the infrastructure, tooling and expertise needed to allow Alchemy engineers to ship, scale and operate high quality products to our customers in a fast, safe and cost efficient manner.

Come and help us build, maintain and scale the underlying infrastructure that is required to build products that delight our customers when it comes to reliability, latency and cost.

What You'll Do :

Set high standards for Reliability at Alchemy
Develop and own company wide Reliability best practices like SLO definition, incident management, postmortem reviews, launch readiness reviews, change management
Architect production infrastructure and tools that encourage and enforces high reliability
Inspire the broader engineering organization to ensure Reliability is a first class citizen in the products we build
Collaborate, partner, advise, review and mentor engineering teams on Reliability topics like high reliability architecture, observability, safe change management
Improve critical infrastructure and systems that are used to operate infrastructure at scale (i.e. compute, networking, deployment, observability, code tooling / libraries etc.)
Develop and own best practices for managing production infrastructure : provisioning, application scaling, configuration management, capacity planning, monitoring, etc.
Develop and own best practices for developer processes : CI / CD, dev and staging environments, etc.
Provide input into long-term platform requirements and operational guidelines with a focus on reliability
Continuously raise our standard of engineering excellence by implementing best practices for coding, testing, and deployment
Build and maintain documentation around process and workflows

What We're Looking For :

5+ years of experience as an Infrastructure Engineer focused on Reliability (e.g., Site Reliability Engineer, Production Engineer, Platform Engineer)

Experience leading and driving company wide reliability efforts and engineering initiatives

Experience with observability best practices and tooling like Prometheus, Grafana and Datadog

Experience designing and operating large-scale, multi-region production systems

Experience working with AWS or other cloud infrastructures

Experience with container schedules and runtimes such as Docker and Kubernetes

Experience with Infrastructure-as-Code (e.g. Terraform, Pulumi, Chef, Puppet, etc)

The cross-functional nature of this role requires strong communication and collaboration skills

(Preferred) Experience with running production services on bare-metal

(Preferred) Experience with Typescript and Python

(Preferred) Excellent understanding of web applications and architecture

More on The Role

Alchemy is committed to offering competitive compensation, including base salary as well as equity. Additionally, Alchemy offers comprehensive medical, dental, and vision coverage, as well as other benefits such as 401k and unlimited flexible time off.

The base salary range for this position is estimated to be between $135,000 - $275,000 annually. Please note this range reflects base salary only, and does not include bonus, equity, or benefits. Your salary will be determined by various factors, including relevant experience, skill set, qualifications, and other business needs.

#J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • San Francisco, CA, United States

Related jobs

Site Reliability Engineer

VirtualVocations • San Francisco, California, United States

Full-time

A company is looking for a Site Reliability Engineer to join a Cloud Services team in a remote role.Key Responsibilities Serve as a cloud SME for clients, providing expertise in design, architect...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Manager

VirtualVocations • Concord, California, United States

Full-time

A company is looking for a Manager, SRE to lead engineering teams in building a reliable and secure identity platform.Key Responsibilities Lead and manage teams responsible for cloud infrastructu...Show more

Last updated: 1 day ago • Promoted

Site Reliability Engineering Manager

VirtualVocations • Fremont, California, United States

Full-time

A company is looking for a Site Reliability Engineering Manager to lead their Site Reliability Engineering team.Key Responsibilities Lead and mentor a team of SREs, promoting growth and collabora...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

LTD Global • Berkeley, CA, US

Full-time

We are seeking a Site Reliability Engineer to join our Operations Group.This role plays a key part in advancing scientific discovery by supporting high-performance computing (HPC) and data analysis...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

PsiQuantum • Palo Alto, CA, United States

Full-time

Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more

Last updated: 30+ days ago • Promoted

Staff Systems Reliability Engineer

VirtualVocations • Fremont, California, United States

Full-time

A company is looking for a Staff Systems Reliability Engineer.Key Responsibilities Design and implement scalable, fault-tolerant AWS-based infrastructure Develop and maintain CI / CD pipelines and...Show more

Last updated: 3 days ago • Promoted

Site Reliability Engineer

Redwood Materials, Inc. • San Francisco, CA, United States

Full-time

Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling—keeping critical minerals in circulation and driving the energy transition.Founded in 2...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

VirtualVocations • Fremont, California, United States

Full-time

A company is looking for a Senior Site Reliability Engineer.Key Responsibilities Maintain scalable, secure, and reliable cloud services to ensure system operations within Service Level Objectives...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

WorkOS • San Francisco, CA, United States

Full-time

About WorkOS 🚀 WorkOS builds tools and services for developers to help them implement authentication, identity, authorization, and overall enterprise readiness. We’re a fully distributed team with ...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Alchemy • San Francisco, CA, United States

Full-time

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Together AI • San Francisco, CA, United States

Full-time

As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a soft...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Foxconn Industrial Internet - FII • San Jose, CA, US

Full-time +1

Foxconn Industrial Internet (Fii), is a world leading professional design and manufacturing service provider of communication network equipment, cloud service equipment, precision tools and industr...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Fractal • San Francisco, CA, United States

Full-time

This range is provided by Fractal.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Fractal Analytics is a strategic AI partner to Fortune 500 com...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Primer • San Francisco, CA, United States

Full-time

Primer helps B2B products break out of the B2C-centric marketing box.Our platform turns consumer ad channels, data streams, and emerging AI workflows into measurable growth engines for go-to-market...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

xAI • Palo Alto, CA, US

Full-time

AI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering exc...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Flexton, Inc. • San Francisco, CA, United States

Full-time

Skill : You have excellent written and verbal communication skills.You have experience managing large websites or services within the context of a large scale web environment.You are able to execute...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Signify Technology • Palo Alto, CA, US

Full-time

Competitive, based on experience.We are a technology startup advancing healthcare with a safety-focused AI platform that assists medical professionals by managing patient communications, including ...Show more

Last updated: 16 days ago • Promoted

Site Reliability Engineer I

Prosper • San Francisco, CA, US

Full-time

As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...Show more

Last updated: 10 days ago • Promoted