Talent.com
Software Engineer (Site Reliability Engineer)
Software Engineer (Site Reliability Engineer)Anyscale • San Francisco, California, United States
Software Engineer (Site Reliability Engineer)

Software Engineer (Site Reliability Engineer)

Anyscale • San Francisco, California, United States
30+ days ago
Job type
  • Full-time
Job description

About Anyscale :

At  Anyscale , we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We’re commercializing  Ray , a popular open-source project that's creating an ecosystem of libraries for scalable machine learning. Companies like  OpenAI ,  Uber ,  Spotify ,  Instacart ,  Cruise , and many more, have Ray in their tech stacks to accelerate the progress of AI applications out into the real world.

With Anyscale, we’re building the best place to run Ray, so that any developer or data scientist can scale an ML application from their laptop to the cluster without needing to be a distributed systems expert.

Proud to be backed by  Andreessen Horowitz, NEA, and Addition with $250+ million raised to date.

About the role :

As a Site Reliability Engineer, you will play a crucial role in ensuring the smooth operation of all user-facing services and other Anyscale production systems. Anyscale values diversity and inclusion, and we encourage applications from individuals of all backgrounds.

This includes processes for provisioning, negotiating prices, managing costs, seeing opportunities for teams to reduce wastage by finding applications across the company. You will apply sound engineering principles, operational discipline, and mature automation to our environments and the Anyscale codebase as we scale.

As part of this role, you will :

Develop a unified perspective on how cloud components are utilized across the company, taking into account diverse needs and requirements.

Ensure that deployment methodologies align with the company's reliability goals.

Build systems that promote understanding of production environments, facilitating quick identification of issues through robust observability infrastructure for metrics, logging, and tracing.

Create monitoring and alerting systems at different levels, enabling teams to easily contribute and enhance the overall monitoring capabilities.

Establish testing infrastructure to support the team in writing and executing tests effectively.

Develop tools for measuring service level objectives (SLOs) and define organization-wide SLOs.

Implement best practices and on-call systems, ensuring efficient incident management and up-leveling the incident management system at Anyscale.

Coordinate the creation and deployment of cloud-based services, including tracking deployments and establishing effective communication channels for issue resolution.

We'd love to hear from you if have :

At least 3 years of relevant work experience in a similar role.

Compensation

At Anyscale, we take a market-based approach to compensation. We are data-driven, transparent, and consistent. As the market data changes over time, the target salary for this role may be adjusted.

This role is also eligible to participate in Anyscale's Equity and Benefits offerings, including the following :

Stock Options

Healthcare plans, with premiums covered by Anyscale at 99%

401k Retirement Plan

Wellness stipend

Education stipend

Paid Parental Leave

Flexible Time Off

Commute reimbursement

100% of in office meals covered

Anyscale Inc. is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law.

Anyscale Inc. is an E-Verify company and you may review the  Notice of E-Verify Participation  and the  Right to Work posters in English and Spanish

Create a job alert for this search

Site Reliability Engineer • San Francisco, California, United States

Related jobs
Site Reliability Engineer

Site Reliability Engineer

PsiQuantum • Palo Alto, CA, United States
Full-time
Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer - Inference

Site Reliability Engineer - Inference

Lambda • San Francisco, California, United States
Full-time
In 2012, Lambda started with a crew of AI engineers publishing research at top machine-learning conferences.We began as an AI company built by AI engineers. Today, we're on a mission to be the world...Show more
Last updated: 30+ days ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Hive • San Francisco, California, United States
Full-time
Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Psiquantum • Palo Alto, California, United States
Full-time
Quantum computing holds the promise of humanity’s mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
Last updated: 30+ days ago • Promoted
Software Engineer, Site Reliability Engineer (SRE)

Software Engineer, Site Reliability Engineer (SRE)

Harvey • San Francisco, California, United States
Full-time
Harvey is a secure AI platform for legal and professional services that augments productivity and automates complex workflows. Harvey uses algorithms with reasoning-adept LLMs that have been customi...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Xai • Palo Alto, California, United States
Full-time
AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show more
Last updated: 30+ days ago • Promoted
Sr. Site Reliability Engineer

Sr. Site Reliability Engineer

Prosper • San Francisco, California, United States
Full-time
As a Senior Site Reliability Engineer (SRE) at Prosper, you will be instrumental in enhancing the reliability, scalability, and maintainability of our technology platform.This role bridges the gap ...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Natcast • Sunnyvale, California, United States
Full-time
Natcast (short for The National Center for the Advancement of Semiconductor Technology) is a new, purpose-built, non-profit entity created to operate the National Semiconductor Technology Center (N...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer - Storage

Site Reliability Engineer - Storage

Xai • Palo Alto, California, United States
Full-time
AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Workos • San Francisco, California, United States
Remote
Full-time
WorkOS builds tools and services for developers to help them implement authentication, identity, authorization, and overall enterprise readiness. We’re a fully distributed team with employees across...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Oppo Us Research Center • Palo Alto, California, United States
Full-time
OPPO US Research Center is seeking a skilled and proactive.Site Reliability Engineer (SRE).In this role, you will be responsible for ensuring the stability, scalability, and performance of our appl...Show more
Last updated: 30+ days ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Loft Orbital Solutions • San Francisco, California, United States
Full-time
Loft Orbital builds a space infrastructure providing a fast & simple path to orbit.We operate satellites, fly customer payloads onboard and handle the entire mission from initial concept through in...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Replit • Foster City, California, United States
Full-time
Replit is the fastest way to turn ideas into software.With our powerful AI-powered Agent and Assistant, anyone can create and launch apps from natural language in just one click.Build and deploy fu...Show more
Last updated: 30+ days ago • Promoted
Senior Site Reliability Engineer - Managed Kubernetes

Senior Site Reliability Engineer - Managed Kubernetes

Lambda • San Francisco, California, United States
Remote
Full-time
We're here to help the smartest minds on the planet build Superintelligence.The labs pushing the edge? They run on Lambda. Our gear trains and serves their models, our infrastructure scales with the...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer - Supercomputing

Site Reliability Engineer - Supercomputing

Xai • Palo Alto, California, United States
Full-time
AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Checkr • San Francisco, California, United States
Full-time
Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...Show more
Last updated: 30+ days ago • Promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Checkr • San Francisco, California, United States
Full-time
Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Runloop • San Francisco, California, United States
Full-time
Runloop is building the foundational infrastructure for the next generation of AI development.We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxe...Show more
Last updated: 30+ days ago • Promoted