Talent.com
Staff Site Reliability Engineer, Storage
Staff Site Reliability Engineer, StorageEpoch Biodesign • San Francisco, CA, US
No longer accepting applications
Staff Site Reliability Engineer, Storage

Staff Site Reliability Engineer, Storage

Epoch Biodesign • San Francisco, CA, US
5 days ago
Job type
  • Full-time
Job description

Crusoe is building the World's Favorite AI-first Cloud infrastructure company. We're pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to power their most advanced AI applications. Crusoe is redefining AI cloud infrastructure, with a mission to align the future of computing with the future of the climate. Our AI platform is recognized as the "gold standard" for reliability and performance. Our data centers are optimized for AI workloads and are powered by clean, renewable energy.

Be part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that's setting the pace for responsible, transformative cloud infrastructure.

About This Role

At Crusoe Energy Systems, our Site Reliability Engineering (SRE) team plays a pivotal role in ensuring the reliability and performance of our infrastructure. SRE at Crusoe is dedicated to detecting, analyzing, and preventing issues to maintain high Service Level Agreement through Service Level Indicators (SLIs) and Service Level Objectives (SLOs). Through automation and proactive remediation, our SREs not only resolve common errors automatically but also advise various engineering teams in building resilient code. We prioritize anticipating and resolving issues before they impact our customers, conducting thorough post-mortems, and driving continuous improvement. Our customer-centric approach ensures that clients always have access to the virtual machines they depend on. Join us to help build and maintain the robust systems that power Crusoe's innovative solutions.

A Day in the Life

As a Site Reliability Engineer at Crusoe Energy Systems, your day begins with a review of overnight alerts and system performance metrics to ensure everything is running smoothly. You will collaborate with your team in a morning stand-up meeting to discuss ongoing projects, recent incidents, and priorities for the day. Your tasks might include automating routine processes, analyzing system logs, and developing tools to enhance our monitoring capabilities. You'll spend part of your day working closely with software engineers, advising on best practices for resilient code and reviewing changes before deployment. Regularly, you will engage in incident response drills, post-mortems, and root cause analysis sessions to learn from past issues and prevent future ones. Throughout the day, you will stay focused on maintaining high SLIs and SLOs, ensuring that our infrastructure remains robust and reliable for our customers. By day's end, you will document your work, share insights with your team, and plan for the next day's challenges, always with a customer-centric mindset.

You Will Thrive In This Role If

8+ years of professional SRE experience

8+ years of experience contributing to architecture and design (architecture, design patterns, reliability and scaling) of new and current systems

Bachelor's Degree in Computer Science or related field, or 10+ years relevant work experience

Solid understanding of infrastructure design, including the operational trade-offs of various designs

Experience writing high quality code with at least one programming language (Python, Go, or similar)

Experience building with modern infrastructure tools such as Docker, Kubernetes, Ansible, Cloud Formation, Terraform

Experience building with modern CI / CD practices and build systems, such as GitLab CI / CD, CircleCI, GitHub Actions

Experience with logging, monitoring and alerting systems and tools

Experience with Unix / Linux environments

Experience with TCP / IP and network programming

Experience with information security best practices

Excellent communication skills

Must be able to pass a background check

Embody the Company values

Benefits

Hybrid work schedule

Industry competitive pay

Restricted Stock Units in a fast growing, well-funded technology company

Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents

Employer contributions to HSA accounts

Paid Parental Leave

Paid life insurance, short-term and long-term disability

Teladoc

401(k) with a 100% match up to 4% of salary

Generous paid time off and holiday schedule

Cell phone reimbursement

Tuition reimbursement

Subscription to the Calm app

MetLife Legal

Company paid commuter benefit; $50 per pay period

Compensation Range

Compensation will be paid up to $250,000 base salary. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.

Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex / gender, sexual preference / orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • San Francisco, CA, US

Related jobs
Staff Site Reliability Engineer

Staff Site Reliability Engineer

Altana AI • San Francisco, CA, United States
Full-time
AI can be a powerful tool for good in the world – at Altana we apply AI to the world’s largest organized body of supply chain data to power a more resilient, more secure, and more sustainable model...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

PsiQuantum • Palo Alto, CA, United States
Full-time
Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
Last updated: 30+ days ago • Promoted
Senior Site Reliability Engineer, Storage

Senior Site Reliability Engineer, Storage

Epoch Biodesign • San Francisco, CA, United States
Full-time
Crusoe Energy is on a mission to unlock value in stranded energy resources through the power of computation.Take a look at what we do! - https : / / www. We aim to align the long term interests of the c...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer - Storage

Site Reliability Engineer - Storage

xAI • San Francisco, CA, United States
Full-time
AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show more
Last updated: 30+ days ago • Promoted
Staff Site Reliability Engineer

Staff Site Reliability Engineer

Checkr • San Francisco, CA, United States
Full-time
Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...Show more
Last updated: 23 days ago • Promoted
Staff Engineer, Site Reliability

Staff Engineer, Site Reliability

Zapier • San Francisco, CA, United States
Full-time
Zapier is building a platform to help millions of businesses globally scale with automation and AI.Our mission is to make automation work for everyone by delivering products that delight our custom...Show more
Last updated: 30+ days ago • Promoted
Senior Staff Site Reliability Engineer - Platform

Senior Staff Site Reliability Engineer - Platform

Quizlet • San Francisco, CA, United States
Full-time
At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, includin...Show more
Last updated: 6 days ago • Promoted
Senior / Staff Site Reliability Engineer

Senior / Staff Site Reliability Engineer

Crusoe • San Francisco, CA, United States
Full-time
Crusoe Energy is on a mission to unlock value in stranded energy resources through the power of computation.We aim to align the long term interests of the climate with the future of global computin...Show more
Last updated: 30+ days ago • Promoted
Staff Site Reliability Engineer, Platform

Staff Site Reliability Engineer, Platform

Gemini • San Francisco, CA, United States
Full-time
Gemini is a global crypto and Web3 platform founded by Cameron and Tyler Winklevoss in 2014, offering a wide range of simple, reliable, and secure crypto products and services to individuals and in...Show more
Last updated: 30+ days ago • Promoted
Staff / Principal Site Reliability Engineer

Staff / Principal Site Reliability Engineer

The Resume Database • Redwood City, CA, United States
Full-time
Staff / Principal Site Reliability Engineer.Staff / Principal Site Reliability Engineer.You’ll architect scalable solutions, navigate complex technical challenges independently, and deliver results und...Show more
Last updated: 8 days ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Alembic Technologies • San Francisco, CA, United States
Full-time
Senior Site Reliability Engineer.This range is provided by Alembic Technologies.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.We’re looking fo...Show more
Last updated: 5 days ago • Promoted
Senior Staff Site Reliability Engineer - Platform

Senior Staff Site Reliability Engineer - Platform

Icon Ventures • San Francisco, CA, United States
Full-time
At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, includin...Show more
Last updated: 6 days ago • Promoted
Staff Site Reliability Engineer

Staff Site Reliability Engineer

Grindr • Palo Alto, CA, United States
Full-time
Staff Site Reliability Engineer.Get AI-powered advice on this job and more exclusive features.This range is provided by Grindr. Your actual pay will be based on your skills and experience — talk wit...Show more
Last updated: 8 days ago • Promoted
Site Reliability Engineer - Storage

Site Reliability Engineer - Storage

Pantera Capital • Palo Alto, CA, United States
Full-time
AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show more
Last updated: 8 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Speak • San Francisco, CA, United States
Full-time
Our mission is to reinvent the way people learn, starting with language.Learning a language can change a life by opening doors to new cultures, careers, and communities. Two billion people around th...Show more
Last updated: 5 days ago • Promoted
Senior Site Reliability Engineer, Storage

Senior Site Reliability Engineer, Storage

Crusoe • San Francisco, CA, United States
Full-time
Senior Site Reliability Engineer, Storage.Senior Site Reliability Engineer, Storage.Senior Site Reliability Engineer, Storage. Senior Site Reliability Engineer, Storage.Crusoe is building the World’...Show more
Last updated: 30+ days ago • Promoted
Staff Site Reliability Engineer - Platform

Staff Site Reliability Engineer - Platform

Icon Ventures • San Francisco, CA, United States
Full-time
At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, includin...Show more
Last updated: 5 days ago • Promoted
Staff Site Reliability Engineer, Platform

Staff Site Reliability Engineer, Platform

Gemini Trust Company • San Francisco, CA, United States
Full-time
Gemini is a global crypto and Web3 platform founded by Cameron and Tyler Winklevoss in 2014, offering a wide range of simple, reliable, and secure crypto products and services to individuals and in...Show more
Last updated: 30+ days ago • Promoted