Talent.com
Site Reliability Engineer

Site Reliability Engineer

Id.meMountain View, California, United States
30+ days ago
Job type
  • Full-time
Job description

Company Overview

ID.me is the next-generation digital identity wallet that simplifies how individuals securely prove their identity online. Consumers can verify their identity with ID.me once and seamlessly login across websites without having to create a new login and verify their identity again. Over 152 million users experience streamlined login and identity verification with ID.me at 20 federal agencies, 45 state government agencies, and 70+ healthcare organizations. More than 600+ consumer brands use ID.me to verify communities and user segments to honor service and build more authentic relationships. ID.me’s technology meets the federal standards for consumer authentication set by the Commerce Department and is approved as a NIST 800-63-3 IAL2 / AAL2 credential service provider by the Kantara Initiative. ID.me is committed to “No Identity Left Behind” to enable all people to have a secure digital identity. To learn more, visit https : / / network.id.me / .

Role Overview

We are seeking a Site Reliability Engineer to join our Core Platform Engineering organization. The SRE team builds the automation, observability, and operational foundations that ensure ID.me’s services are reliable, scalable, and secure.

As an SRE, you will design and implement systems that improve uptime, resilience, and developer productivity. You’ll focus on infrastructure automation, observability, performance optimization, and incident response , partnering closely with Software Engineers, Platform Engineers, and Security teams to embed reliability best practices throughout the product lifecycle.

This role is based out of our Mountain View, CA or McLean, VA offices and requires full-time in-office attendance .

Responsibilities

  • Build and maintain automated reliability tooling , infrastructure as code, and observability systems that enhance uptime and service performance.
  • Develop monitoring, logging, and alerting frameworks (e.g., Prometheus, Grafana, OpenTelemetry) to detect and remediate issues proactively.
  • Partner with engineering teams to design and implement scalable, fault-tolerant systems that meet defined SLIs and SLOs.
  • Automate repetitive operational tasks and develop self-healing and auto-remediation mechanisms to minimize human intervention.
  • Participate in on-call rotations and lead incident response efforts, performing post-incident reviews and driving systemic improvements.
  • Improve the deployment and release process using CI / CD pipelines and progressive delivery techniques to ensure stability and safety.
  • Champion observability, reliability, and operational readiness reviews as part of the development process.
  • Collaborate with Security and Compliance teams to ensure production systems meet FedRAMP, NIST, and internal policy requirements .
  • Contribute to documentation, runbooks, and internal tooling to enhance knowledge sharing and operational maturity across teams.

Minimum Qualifications

  • Bachelor’s degree in Computer Science, Software Engineering, or a related technical field.
  • 3-5 years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering.
  • Proficiency in at least one modern programming language (e.g., Java, Go, Python, Ruby, JavaScript).
  • Preferred Qualifications

  • Hands-on experience managing and scaling services in cloud environments such as AWS, GCP, or Azure.
  • Strong understanding of containerization and orchestration technologies (Docker, Kubernetes).
  • Experience implementing and maintaining CI / CD pipelines and automation frameworks.
  • Working knowledge of observability systems —metrics, tracing, logging, and alerting.
  • Experience building automated recovery, failover, or chaos-engineering systems to validate reliability.
  • Familiarity with event-driven architecture and asynchronous processing systems.
  • Knowledge of distributed systems design, load balancing, and performance optimization .
  • Exposure to infrastructure-as-code tools (Terraform, Pulumi, Ansible) and GitOps practices.
  • Understanding of security and compliance frameworks (FedRAMP, SOC2, or NIST 800-53).
  • Strong analytical and troubleshooting skills across the stack—from network to application layer.
  • Excellent communication and documentation skills, with a focus on cross-team collaboration and continuous improvement.
  • The annual base salary listed does not include a company bonus, incentive for sales roles, equity and benefits which will be determined based on experience, skills, education, relevant training, geographic location and role.

    ID.me offers comprehensive medical, dental, vision, health savings account, flexible spending accounts (medical, limited purpose, dependent care, commuter benefit accounts), basic and voluntary life and AD&D insurance, 401(k) with company match, parental leave, ability to participate in unlimited paid time off subject to the terms and conditions of the PTO policy, including 8 company wide holidays, short and long-term disability insurance, accident and critical illness insurance, referral bonus policy, employee assistance program, pet insurance, travel assistant program, wellbeing and childcare discounts, benefit advocates, and a learning and development benefit.

    The above represents the anticipated total rewards package for this job requisition. Final offers may vary from the amount listed based on qualifications, professional experiences, skills, education, relevant training, geographic location, and other job related factors.

    Mountain View, CA Pay Range

    $168,926 - $192,500 USD

    ID.me maintains a work environment free from discrimination, where employees are treated with dignity and respect. All ID.me employees share in the responsibility for fulfilling our commitment to equal employment opportunity. ID.me does not discriminate against any employee or applicant on the basis of age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. ID.me adheres to these principles in all aspects of employment, including recruitment, hiring, training, compensation, promotion, benefits, social and recreational programs, and discipline. In addition, ID.me's policy is to provide reasonable accommodation to qualified employees who have protected disabilities to the extent required by applicable laws, regulations and ordinances where a particular employee works. Upon request we will provide you with more information about such accommodations.

    Please review our Privacy Policy, including our CCPA policy, at id.me / privacy . If you provide ID.me with any personally identifiable information you confirm that you have read and agree to be bound by the terms and conditions set out in our Privacy Policy.

    ID.me participates in E-Verify.

    Create a job alert for this search

    Site Reliability Engineer • Mountain View, California, United States

    Related jobs
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ConductorOneSan Francisco, CA, United States
    Full-time
    ConductorOne is the first AI-native identity security platform that protects every identity : human, non-human, and AI.With powerful automation, platform-level AI, and out-of-the-box connectors, it ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    FortinetSunnyvale, CA, United States
    Full-time
    At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, and obsess...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    ProsperSan Francisco, CA, United States
    Full-time
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...Show moreLast updated: 26 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Rivago Infotech IncSan Francisco, CA, United States
    Full-time
    Staff Site Reliability Engineer (SRE).As our Staff SRE, you'll be the primary expert responsible for our entire compute ecosystem. Your key responsibilities will include : .Design, implement, and lead...Show moreLast updated: 7 days ago
    • Promoted
    Senior Site Reliability Engineer – Platform

    Senior Site Reliability Engineer – Platform

    Icon VenturesSan Francisco, CA, United States
    Full-time
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.We blend cognitive science with machine learning to personalize and enhance the lear...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantumPalo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Archetype AIPalo Alto, CA, United States
    Full-time
    Get AI-powered advice on this job and more exclusive features.Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team f...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Runloop AISan Francisco, CA, United States
    Full-time
    Runloop is building the foundational infrastructure for the next generation of AI development.We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxe...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    AlchemySan Francisco, CA, United States
    Full-time
    Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers th...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Together AISan Francisco, CA, United States
    Full-time
    As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a soft...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Alembic TechnologiesSan Francisco, CA, United States
    Full-time
    Senior Site Reliability Engineer.This range is provided by Alembic Technologies.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.We’re looking fo...Show moreLast updated: 3 days ago
    • Promoted
    Site Reliability Engineer - Observability

    Site Reliability Engineer - Observability

    Rivian and Volkswagen Group TechnologiesPalo Alto, CA, United States
    Full-time
    Senior Site Reliability Engineer (SRE).RivianVW's Data Platform - Production Engineering team.In this role, you will design, implement, and scale robust observability systems to ensure the health, ...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PrimerSan Francisco, CA, United States
    Full-time
    Primer helps B2B products break out of the B2C-centric marketing box.Our platform turns consumer ad channels, data streams, and emerging AI workflows into measurable growth engines for go-to-market...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    HiveSan Francisco, CA, United States
    Full-time
    Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    SpeakSan Francisco, CA, United States
    Full-time
    Our mission is to reinvent the way people learn, starting with language.Learning a language can change a life by opening doors to new cultures, careers, and communities. Two billion people around th...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    P2PSan Francisco, CA, United States
    Full-time
    Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers th...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer II

    Site Reliability Engineer II

    Hinge HealthSan Francisco, CA, United States
    Full-time
    From scaling Kubernetes clusters to improving observability with Datadog, we build the tooling and automation that empower product teams to ship with confidence. Collaborate with engineering teams t...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer - Kubernetes Platform

    Site Reliability Engineer - Kubernetes Platform

    Pantera CapitalPalo Alto, CA, United States
    Full-time
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show moreLast updated: 19 days ago