Talent.com
Senior Site Reliability Engineer (Cloud Infra)
Senior Site Reliability Engineer (Cloud Infra)Mumba Technologies, Inc. • Palo Alto, CA, United States
Senior Site Reliability Engineer (Cloud Infra)

Senior Site Reliability Engineer (Cloud Infra)

Mumba Technologies, Inc. • Palo Alto, CA, United States
15 hours ago
Job type
  • Full-time
Job description

About the Role

We are seeking a highly skilled Senior Site Reliability Engineer to join our team. In this role responsibilities will include designing and implementing infrastructure automation, continuous integration and delivery pipelines, and monitoring and scaling the infrastructure that powers our healthcare AI platform. You will work closely with software engineers, research scientists, and other cross-functional teams to develop and maintain reliable and scalable infrastructure that enables rapid iteration and deployment of our products.

Key Responsibilities

  • Design and implement infrastructure automation and deployment pipelines using tools such as Terraform
  • Implement and maintain monitoring and logging systems to ensure the reliability and performance of our healthcare AI platform
  • Work closely with software engineers to design and deploy scalable, fault-tolerant, and secure production systems on cloud platforms such as AWS, GCP, or Azure
  • Develop and maintain security and compliance policies and procedures for our healthcare AI platform
  • Collaborate with cross-functional teams to troubleshoot and resolve complex issues related to infrastructure, deployment, and operations
  • Implement and maintain disaster recovery and business continuity plans
  • Develop and maintain documentation related to infrastructure, deployment, and operations
  • Mentor and provide technical guidance to junior engineers

Qualifications

  • Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field
  • At least 5 years of professional experience as SRE
  • Strong skills in building cloud infra orchestration systems (Operators) using python, Go
  • Expertise in infrastructure automation and deployment tools such as Terraform, or GitLab CI / CD
  • Experience with cloud platforms such as AWS, GCP, or Azure
  • Strong knowledge of containerization technologies such as Docker and Kubernetes
  • Experience with monitoring and logging tools such as ELK, Grafana, or Datadog
  • Familiarity with security and compliance best practices and tools such as HashiCorp Vault, AWS KMS, or Azure Key Vault
  • Strong problem-solving skills and ability to work independently and collaboratively in a team environment
  • Excellent communication and interpersonal skills
  • Experience implementing HIPAA and SOC2 compliance in a plus
  • Experience working in an HPC Environment is a plus
  • Create a job alert for this search

    Senior Site Reliability Engineer • Palo Alto, CA, United States

    Related jobs
    Staff SRE : Lead Cloud-Native Platform Reliability & Scale

    Staff SRE : Lead Cloud-Native Platform Reliability & Scale

    Heartflow • San Francisco, CA, United States
    Full-time
    A medical technology company in San Francisco is looking for an experienced Site Reliability Engineer to enhance the availability and scalability of their systems. The ideal candidate will possess d...Show more
    Last updated: 6 days ago • Promoted
    Staff SRE : Cloud-Native Reliability Lead, SF On-Site

    Staff SRE : Cloud-Native Reliability Lead, SF On-Site

    HeartFlow, Inc. • San Francisco, CA, United States
    Full-time
    A medical technology company based in San Francisco seeks an experienced Site Reliability Engineer to enhance their cloud infrastructure and ensure system reliability. The role demands expertise in ...Show more
    Last updated: 9 hours ago • Promoted • New!
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    The Recruiting Guy • San Francisco, CA, United States
    Full-time
    Be among the first 25 applicants.This range is provided by The Recruiting Guy.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Senior Cloud Infra...Show more
    Last updated: 7 days ago • Promoted
    Senior Site Reliability Engineer – Platform

    Senior Site Reliability Engineer – Platform

    Icon Ventures • San Francisco, CA, United States
    Full-time
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.We blend cognitive science with machine learning to personalize and enhance the lear...Show more
    Last updated: 7 days ago • Promoted
    Senior Site Reliability Engineer, Compute

    Senior Site Reliability Engineer, Compute

    Crusoe • San Francisco, CA, United States
    Full-time
    Crusoe's mission is to accelerate the abundance of energy and intelligence.We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, spe...Show more
    Last updated: 19 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    SOLANA FOUNDATION • San Francisco, CA, United States
    Full-time
    Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers th...Show more
    Last updated: 4 hours ago • Promoted • New!
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantum • Palo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer - Deployer / Delivery

    Senior Site Reliability Engineer - Deployer / Delivery

    Okta • San Francisco, CA, United States
    Full-time +1
    Okta is The World's Identity Company.We free everyone to safely use any technology, anywhere, on any device or app.Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secur...Show more
    Last updated: 10 days ago • Promoted
    CloudDevs : Senior Web site Reliability Engineer (SRE)

    CloudDevs : Senior Web site Reliability Engineer (SRE)

    The10minutecareersolution • San Francisco, CA, United States
    Full-time
    CloudDevs : Senior Web site Reliability Engineer (SRE).CloudDevs works with fast-moving, venture-backed startups throughout the US. We’re constructing a pool of world-class Web site Reliability Engin...Show more
    Last updated: 2 days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Checkr • San Francisco, CA, United States
    Full-time
    Checkr is building the data platform to power safe and fair decisions.Established in 2014, Checkr’s innovative technology and robust data platform help customers assess risk and ensure safety and c...Show more
    Last updated: 7 days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Alembic • San Francisco, CA, United States
    Full-time
    We’re looking for an experienced.Site Reliability Engineer (SRE).You’ll partner with engineers and data scientists to build, automate, and maintain the infrastructure that powers our core platform—...Show more
    Last updated: 8 days ago • Promoted
    Senior Site Reliability Engineer, Compute

    Senior Site Reliability Engineer, Compute

    Epoch Biodesign • San Francisco, CA, United States
    Full-time
    Crusoe's mission is to accelerate the abundance of energy and intelligence.We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, spe...Show more
    Last updated: 4 hours ago • Promoted • New!
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Circle • San Francisco, CA, United States
    Full-time
    Senior Site Reliability Engineer at Circle.Circle is a financial technology company at the epicenter of the emerging internet of money. Our infrastructure—including USDC, a blockchain‑based dollar—h...Show more
    Last updated: 30+ days ago • Promoted
    Staff Site Reliability Engineer, Platform — Hybrid & Scalable

    Staff Site Reliability Engineer, Platform — Hybrid & Scalable

    Gemini • San Francisco, CA, United States
    Full-time
    A leading crypto and Web3 platform in San Francisco seeks a Staff Site Reliability Engineer to lead engineering teams towards modern DevOps practices. This role involves developing automation tools ...Show more
    Last updated: 1 day ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Hive • San Francisco, CA, United States
    Full-time
    Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...Show more
    Last updated: 30+ days ago • Promoted
    Remote SRE : Build Reliable Cloud Infra + Equity

    Remote SRE : Build Reliable Cloud Infra + Equity

    ConductorOne • San Francisco, CA, US
    Remote
    Full-time
    A leading identity security platform in San Francisco is looking for a Site Reliability Engineer to enhance and manage their infrastructure. You will design reliable cloud systems and build automati...Show more
    Last updated: 9 hours ago • Promoted • New!
    Site Reliability Engineer

    Site Reliability Engineer

    P2P • San Francisco, CA, United States
    Full-time
    Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers th...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer – Platform

    Senior Site Reliability Engineer – Platform

    Quizlet, Inc. • San Francisco, CA, United States
    Full-time
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, includin...Show more
    Last updated: 2 days ago • Promoted