Talent.com
Principal Site Reliability Engineer
Principal Site Reliability EngineerEarly Warning Services LLC • San Francisco, CA, United States
Principal Site Reliability Engineer

Principal Site Reliability Engineer

Early Warning Services LLC • San Francisco, CA, United States
15 hours ago
Job type
  • Full-time
Job description

Positions located in Scottsdale, San Francisco, Chicago, or New York follow a hybrid work model to allow for a more collaborative working environment.Candidates responding to this posting must independently possess the eligibility to work in the United States, for any employer, at the date of hire. This position is ineligible for employment Visa sponsorship.

  • Overall Purpose
  • The Principal Site Reliability Engineer partners with development teams by designing availability and resiliency patterns in applications and infrastructure.
  • Essential Functions :
  • Design and Implement software and tools to improve the performance - availability, scalability, and latency, while delivering end products to customer with the highest efficiency and meeting all security standards.
  • Supports the company’s commitment to risk management and protecting the integrity and confidentiality of systems and data.
  • Build automation and tooling around application management, such as deployments, configuration changes and disaster recovery scenarios.
  • Design, Implement and evangelize Observability and monitoring systems to proactively detect problems and identify cause.
  • Evaluate capacity of the application on a continuous basis to provide stats to the Product / Business teams and recommend an efficient path to scale for future needs.
  • Identify performance bottlenecks and work with cross-functional teams to troubleshoot and resolve issues.
  • Serve as a technical liaison for the application and provide documents and runbooks to Level 1 and Level 2 teams.
  • Participate in 24 X 7 on-call rotation.
  • Be a champion of excellent processes; take the initiative in developing repeatable patterns and standard, re-usable work across teams.
  • Work directly with application development teams to provide feedback and technical requirements to the software development lifecycle, implementing best-practice microservice design patterns and other modern software development approaches.
  • Understand and support the adoption of best-practice microservice design patterns and other modern software reliability approaches and techniques.
  • Be a thought leader : a senior point of expertise on site reliability engineering issues, industry trends and developing technologies. Be a role model to others on the team. Coach and mentor team members.
  • Minimum Qualifications
  • Education and experience typically obtained through completion of a Bachelor’s Degree in Business and / or Computer Science or related field.
  • 12+ years of related experience managing large complex projects in a technical or software development environment inclusive of post-graduate degree
  • Proven ability to lead a team through high priority Incidents and improve the RCA proces
  • Excellent troubleshooting skills and proven experience resolving technical issues in complex environments
  • Hands-on experience in designing and developing using the one or more of the following technologies - Python, Go, Java - Docker - Experience in Microservices Architecture. - Messaging frameworks such as Kafka, SQS or JMS - Database Technologies like Oracle, Dynamo DB, Aurora etc.. - Caching layers such as Redis and memcached
  • Strong understanding of Linux administration
  • Experience with CI / CD pipeline implementation including GIT, Chef, Maven, Jenkins etc
  • Strong understanding of networking fundamentals
  • Experience in leading cross-functional teams to create technical solutions.
  • Proven track record designing and building complex end-to-end systems (full stack developer)
  • Background and drug screen
  • Preferred Qualifications
  • Good programming skills in one or more of the following languages : Java, ruby, python, JavaScript and GO
  • Hands-on experience in supporting applications in a 24X7 customer-facing production environment.
  • Working knowledge of AWS, Docker, Kubernetes, SwarmThe base pay scale for this position in : Phoenix, AZ / Chicago, IL in USD per year is : $172,000 - $215,000. New York, NY / San Francisco, CA in USD per year is : $206,000 - $258,000. Additionally, candidates are eligible for a discretionary incentive plan and benefits.
  • Physical Requirements
  • Employee must be able to perform essential functions and physical requirements of position with or without reasonable accommodation.Candidates responding to this posting must independently possess the eligibility to work in the United States at the date of hire.Some of the Ways We Prioritize Your Health and Happiness
  • Healthcare Coverage –Competitive medical (PPO / HDHP), dental, and vision plans as well as company contributions to your Health Savings Account (HSA) or pre-tax savings through flexible spending accounts (FSA) for commuting, health & dependent care expenses.
  • 401(k) Retirement Plan –Featuring a 100% Company Safe Harbor Match on your first 6% deferral immediately upon eligibility.
  • Paid Time Off – Unlimited Time Off for Exempt (salaried) employees, as well as generous PTO for Non-Exempt (hourly) employees, plus 11 paid company holidays and a paid volunteer day.
  • 12 weeks of Paid Parental Leave
  • Maven Family Planning – provides support through your Parenting journey including egg freezing, fertility, adoption, surrogacy, pregnancy, postpartum, early pediatrics, and returning to work. And SO much more! We continue to enhance our program, so be sure to for the latest. Our team can share more during the interview process!Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
  • CURRENT EMPLOYEES : Apply for open positions via Job Hub in your Workday Account.
  • for an assistance request.E-Verify
  • ## Privacy Notice
  • Effective :
  • May 2, 2025
  • This privacy notice is intended to inform California residents of the personal information we collect, how it’s used and disclosed, and the rights you have in regard to such information.Click below for the full privacy notice

#J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • San Francisco, CA, United States

Related jobs
Site Reliability Engineer

Site Reliability Engineer

DevOps projects • Berkeley, CA, United States
Full-time
LMArena is an engineering-first startup redefining how the world evaluates large language models.Created in 2023 by UC Berkeley researchers, our neutral, community-driven benchmarking platform attr...Show more
Last updated: 6 days ago • Promoted
Staff Site Reliability Engineer

Staff Site Reliability Engineer

Altana • San Francisco, California, United States
Full-time
AI can be a powerful tool for good in the world – at Altana we apply AI to the world’s largest organized body of supply chain data to power a more resilient, more secure, and more sustainable model...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer I

Site Reliability Engineer I

Prosper • San Francisco, CA, United States
Full-time
As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...Show more
Last updated: 29 days ago • Promoted
Principal Site Reliability Engineer

Principal Site Reliability Engineer

Harrison Clarke • San Francisco, CA, United States
Full-time
Principal Site Reliability Engineer (SRE).The ideal candidate should have extensive experience in designing highly scalable infrastructure, building systems, and performing testing, monitoring, and...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer, Frontier Systems Infrastructure

Site Reliability Engineer, Frontier Systems Infrastructure

OpenAI • San Francisco, California, United States
Full-time
The Frontier Systems team at OpenAI builds, launches, and supports the largest supercomputers in the world that OpenAI uses for its most cutting edge model training. We take data center designs, tur...Show more
Last updated: 21 days ago • Promoted
Site Reliability Engineer - Inference

Site Reliability Engineer - Inference

Lambda • San Francisco, California, United States
Full-time
In 2012, Lambda started with a crew of AI engineers publishing research at top machine-learning conferences.We began as an AI company built by AI engineers. Today, we're on a mission to be the world...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Conductorone • San Francisco, California, United States
Full-time
ConductorOne is the modern identity governance platform that makes it possible to move beyond the limitations of legacy IGA and reduce the identity attack surface with confidence.Designed for flexi...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Baseten • San Francisco, California, United States
Full-time
We’re a growing team of builders backed by top-tier investors, including.ML teams at enterprises and category-defining AI-native companies like. Baseten to power their core production workloads with...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Zoox • Foster City, California, United States
Full-time
Zoox is looking for a platform / site reliability engineer who will be responsible for measuring and maintaining the uptime of the many services critical to the development process for autonomous veh...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Redwood Materials • San Francisco, California, United States
Full-time
Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling .Responsibilities will include : . Collect business & technical requirements and work wit...Show more
Last updated: 30+ days ago • Promoted
Principal Site Reliability Engineer

Principal Site Reliability Engineer

Early Warning® • San Francisco, CA, United States
Full-time
At Early Warning, we’ve powered and protected the U.Zelle®, Paze℠, and so much more.As a trusted name in payments, we partner with thousands of institutions to increase access to financial services...Show more
Last updated: 4 days ago • Promoted
Senior / Principal Site Reliability Engineer

Senior / Principal Site Reliability Engineer

Datacrunch • San Francisco, CA, United States
Full-time +1
Imagine a future where everyone has instant, low-cost access to intelligence.We’re building a fully featured European AI cloud - with everything one needs to train, experiment with, and deploy AI m...Show more
Last updated: 7 days ago • Promoted
Lead Site Reliability Engineer

Lead Site Reliability Engineer

Visa • Foster City, California, United States
Full-time
Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Replit • Foster City, California, United States
Full-time
Replit is the fastest way to turn ideas into software.With our powerful AI-powered Agent and Assistant, anyone can create and launch apps from natural language in just one click.Build and deploy fu...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Checkr • San Francisco, California, United States
Full-time
Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Latent • San Francisco, California, United States
Full-time
San Francisco, CA (5 Days In-Office).You are the infrastructure expert who enables our rapid product development and guarantees. AI platform for major health systems.Your focus on operational excell...Show more
Last updated: 30+ days ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Alembic • San Francisco, California, United States
Full-time
We’re looking for an experienced.Site Reliability Engineer (SRE).You’ll partner with engineers and data scientists to build, automate, and maintain the infrastructure that powers our core platform—...Show more
Last updated: 12 days ago • Promoted
Founding Site Reliability Engineer

Founding Site Reliability Engineer

Assort Health • San Francisco, California, United States
Full-time
Our mission is to make exceptional healthcare accessible anytime, anywhere, for everyone.That’s why we’re building a new foundation for how patients and providers connect, driven by AI, built to em...Show more
Last updated: 30+ days ago • Promoted