Site Reliability EngineerNeara • Palo Alto, CA, United States

Site Reliability Engineer

Neara • Palo Alto, CA, United States

4 hours ago

Job type

Full-time

Job description

Job type : Full Time

Department : Backend Engineer
Work type : Remote

About A rchetype AI

Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team from Google, Archetype AI is building a foundation model for the physical world, a real-time multimodal LLM for real life, transforming real-world data into valuable insights and knowledge that people will be able to interact with naturally. It will help people in their real lives, not just online, because it understands the real-time physical environment and everything that happens in it.

Supported by deep tech venture funds in Silicon Valley, Archetype AI is currently pre-Series A, progressing rapidly to develop technology for their next stage. This presents a unique and once-in-a-lifetime opportunity to be part of an exciting AI team at the beginning of their journey, located in the heart of Silicon Valley.

Our team is headquartered in Palo Alto, California, with team members throughout the US and Europe.

We are actively growing, so if you are an exceptional candidate excited to work on the cutting edge of physical AI and don’t see a role that exactly fits you below you can contact us directly with your resume via jobsarchetypeai

io.

About the Role

As a Site Reliability Engineer (SRE) at Archetype AI, you will be responsible for designing, scaling, and maintaining the infrastructure that powers our AI-driven products. You will collaborate with backend engineers and ML researchers to ensure that our distributed platforms are fault-tolerant, performant, and highly available.

Core Responsibilities

Design, build, and operate highly available distributed systems.

Collaborate with engineering and ML teams to ensure reliable deployment of backend services (in Rust, C++ or similar).

Implement monitoring, alerting, and observability solutions across infrastructure.

Automate deployments, scaling, and infrastructure provisioning using infrastructure-as-code.

Diagnose and resolve performance bottlenecks, system outages, and production incidents.

Support AI / ML infrastructure for training and serving models at scale, including GPU clusters, pipelines, and inference services.

Contribute to infrastructure architecture, standards, and operational best practices.

Minimum Qualifications

5+ years of experience as SRE, DevOps, or Systems Engineer.

Strong expertise in distributed systems, fault-tolerant architectures, and large-scale production environments.

Proficiency in Rust, C++, or other backend languages with willingness to learn.

Solid experience with Kubernetes, containers, and cloud platforms (AWS, GCP, Azure).

Hands‑on experience with monitoring and observability tools (Prometheus, Grafana, ELK, OpenTelemetry).

Experience with data pipelines, messaging systems, and streaming technologies (Kafka, Pulsar, etc.).

Familiarity with AI / ML infrastructure (training pipelines, GPU clusters, inference systems).

Strong debugging, problem‑solving, and automation mindset (Terraform, Ansible, Pulumi, scripting).

Excellent communication and collaboration skills.

Preferred Qualifications

Experience with real‑time or low‑latency systems.

Open‑source contributions to distributed systems or infrastructure projects.

Knowledge of security best practices for distributed environments.

Experience with edge or embedded systems and sensor‑based infrastructure.

Background in multimodal data fusion or physical‑world perception systems.

What We Value

Ownership – You take initiative, follow through, and care deeply about quality and outcomes.

Motivation – You’re driven to solve complex problems and continuously raise the bar for yourself and your team.

Excellence – You bring discipline, clarity, and rigor to your craft—and help others do the same.

Collaboration – You work well with others, mentor generously, and contribute to a high‑trust, high‑performance culture.

#J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • Palo Alto, CA, United States

Related jobs

Site Reliability Engineer

Fortinet • Sunnyvale, CA, United States

Full-time

At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, and obsess...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Rivago Infotech Inc • San Francisco, CA, United States

Full-time

Staff Site Reliability Engineer (SRE).As our Staff SRE, you'll be the primary expert responsible for our entire compute ecosystem. Your key responsibilities will include : .Design, implement, and lead...Show more

Last updated: 8 days ago • Promoted

Senior Site Reliability Engineer – Platform

Icon Ventures • San Francisco, CA, United States

Full-time

At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.We blend cognitive science with machine learning to personalize and enhance the lear...Show more

Last updated: 5 days ago • Promoted

Site Reliability Engineer

PsiQuantum • Palo Alto, CA, United States

Full-time

Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Archetype AI • Palo Alto, CA, United States

Full-time

Get AI-powered advice on this job and more exclusive features.Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team f...Show more

Last updated: 5 days ago • Promoted

Senior Site Reliability Engineer

Corelight • San Francisco, CA, United States

Full-time

Senior Site Reliability Engineer.We are looking for a Senior Site Reliability Engineer to design, automate, and scale cloud and hybrid platforms that power AI / ML workloads and SaaS services.You\'ll...Show more

Last updated: 8 days ago • Promoted

Site Reliability Engineer

Runloop AI • San Francisco, CA, United States

Full-time

Runloop is building the foundational infrastructure for the next generation of AI development.We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxe...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Alchemy • San Francisco, CA, United States

Full-time

Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers th...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Together AI • San Francisco, CA, United States

Full-time

As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a soft...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

Alembic Technologies • San Francisco, CA, United States

Full-time

Senior Site Reliability Engineer.This range is provided by Alembic Technologies.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.We’re looking fo...Show more

Last updated: 4 days ago • Promoted

Site Reliability Engineer

Primer • San Francisco, CA, United States

Full-time

Primer helps B2B products break out of the B2C-centric marketing box.Our platform turns consumer ad channels, data streams, and emerging AI workflows into measurable growth engines for go-to-market...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer

Hive • San Francisco, CA, United States

Full-time

Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer

Speak • San Francisco, CA, United States

Full-time

Our mission is to reinvent the way people learn, starting with language.Learning a language can change a life by opening doors to new cultures, careers, and communities. Two billion people around th...Show more

Last updated: 5 days ago • Promoted

Site Reliability Engineer

P2P • San Francisco, CA, United States

Full-time

Last updated: 30+ days ago • Promoted

Site Reliability Engineer II

Hinge Health • San Francisco, CA, United States

Full-time

From scaling Kubernetes clusters to improving observability with Datadog, we build the tooling and automation that empower product teams to ship with confidence. Collaborate with engineering teams t...Show more

Last updated: 30+ days ago • Promoted

Site Reliability Engineer - Kubernetes Platform

Pantera Capital • Palo Alto, CA, United States

Full-time

AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show more

Last updated: 20 days ago • Promoted

Site Reliability Engineer

Cypress HCM • San Francisco, CA, United States

Full-time

This range is provided by Cypress HCM.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. As a Site Reliability Engineer (Contractor), you will be a...Show more

Last updated: 1 day ago • Promoted

Senior Site Reliability Engineer – Platform

Quizlet, Inc. • San Francisco, CA, United States

Full-time

At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, includin...Show more

Last updated: 4 hours ago • Promoted • New!