Talent.com
Senior Site Reliability Engineer (Cloud Infra)
Senior Site Reliability Engineer (Cloud Infra)Mumba Technologies, Inc. • Palo Alto, CA, United States
Senior Site Reliability Engineer (Cloud Infra)

Senior Site Reliability Engineer (Cloud Infra)

Mumba Technologies, Inc. • Palo Alto, CA, United States
1 day ago
Job type
  • Full-time
Job description

About the Role

We are seeking a highly skilled Senior Site Reliability Engineer to join our team. In this role responsibilities will include designing and implementing infrastructure automation, continuous integration and delivery pipelines, and monitoring and scaling the infrastructure that powers our healthcare AI platform. You will work closely with software engineers, research scientists, and other cross-functional teams to develop and maintain reliable and scalable infrastructure that enables rapid iteration and deployment of our products.

Key Responsibilities

  • Design and implement infrastructure automation and deployment pipelines using tools such as Terraform
  • Implement and maintain monitoring and logging systems to ensure the reliability and performance of our healthcare AI platform
  • Work closely with software engineers to design and deploy scalable, fault-tolerant, and secure production systems on cloud platforms such as AWS, GCP, or Azure
  • Develop and maintain security and compliance policies and procedures for our healthcare AI platform
  • Collaborate with cross-functional teams to troubleshoot and resolve complex issues related to infrastructure, deployment, and operations
  • Implement and maintain disaster recovery and business continuity plans
  • Develop and maintain documentation related to infrastructure, deployment, and operations
  • Mentor and provide technical guidance to junior engineers

Qualifications

  • Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field
  • At least 5 years of professional experience as SRE
  • Strong skills in building cloud infra orchestration systems (Operators) using python, Go
  • Expertise in infrastructure automation and deployment tools such as Terraform, or GitLab CI / CD
  • Experience with cloud platforms such as AWS, GCP, or Azure
  • Strong knowledge of containerization technologies such as Docker and Kubernetes
  • Experience with monitoring and logging tools such as ELK, Grafana, or Datadog
  • Familiarity with security and compliance best practices and tools such as HashiCorp Vault, AWS KMS, or Azure Key Vault
  • Strong problem-solving skills and ability to work independently and collaboratively in a team environment
  • Excellent communication and interpersonal skills
  • Experience implementing HIPAA and SOC2 compliance in a plus
  • Experience working in an HPC Environment is a plus
  • Create a job alert for this search

    Senior Site Reliability Engineer • Palo Alto, CA, United States

    Related jobs
    Cloud DevOps / Site Reliability Engineer

    Cloud DevOps / Site Reliability Engineer

    TEKsystems • Sunnyvale, CA, United States
    Full-time
    The Cloud DevOps / Site Reliability Engineer is responsible for automating, modernizing, and ensuring reliability of hybrid infrastructure spanning AWS and on-prem data center systems.This role man...Show more
    Last updated: 8 days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Tarana Wireless • Milpitas, California, United States
    Full-time
    Join the Team That's Redefining Wireless Technology.Our groundbreaking Fixed Wireless Access technology is delivering .Senior Site Reliability Engineer. You will work on a team and be a main point o...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Site Reliability Engineer

    Sr. Site Reliability Engineer

    Globality • Palo Alto, California, United States
    Full-time
    Joel Hyatt and Lior Delgo founded Globality with a vision to create prosperous and healthy economies, companies, communities, and individuals. In this new era of the Autonomous Enterprise, Globality...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer (Senior SRE)

    Senior Site Reliability Engineer (Senior SRE)

    Ciroos • Pleasanton, California, United States
    Full-time
    Ciroos (pronounced “Sai rose”) is a seed-stage startup founded in February 2025 by a team of experienced executives and distinguished engineers with deep expertise in observability, AI, distributed...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantum • Palo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer - Kubernetes Platform

    Site Reliability Engineer - Kubernetes Platform

    Pantera Capital • Palo Alto, CA, United States
    Full-time
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show more
    Last updated: 10 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Archetype AI • Palo Alto, CA, United States
    Full-time
    Get AI-powered advice on this job and more exclusive features.Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team f...Show more
    Last updated: 7 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Psiquantum • Palo Alto, California, United States
    Full-time
    Quantum computing holds the promise of humanity’s mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer - Openstack

    Site Reliability Engineer - Openstack

    Fortinet • Sunnyvale, California, United States
    Full-time
    Fortinet is recruiting a Site Reliability Engineer- OPENSTACK to join our FortiStack team.This team is responsible for the management, operation and continued development of our Openstack-based pri...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    OPPO • Palo Alto, CA, United States
    Full-time
    OPPO US Research Center is seeking a skilled and proactive.Site Reliability Engineer (SRE).In this role, you will be responsible for ensuring the stability, scalability, and performance of our appl...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Oppo Us Research Center • Palo Alto, California, United States
    Full-time
    OPPO US Research Center is seeking a skilled and proactive.Site Reliability Engineer (SRE).In this role, you will be responsible for ensuring the stability, scalability, and performance of our appl...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer Apple Services Engineering (ASE) iCloud

    Senior Site Reliability Engineer Apple Services Engineering (ASE) iCloud

    Apple • Cupertino, California, USA
    Full-time
    You will apply SRE best practices to ensure the availability reliability and performance of our systems and services.Infrastructure Ops Site Reliability Engineering or DevOps focused role.BS degree...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Key2Source • San Leandro, California, USA
    Full-time
    Job Title : Site Reliability Engineer.Location : San Leandro CA (Onsite).Engineering experience or equivalent demonstrated through one or a combination of the following : work experience training mili...Show more
    Last updated: 6 days ago • Promoted
    Site Reliability Engineer - SRE at Descope Los Altos, CA

    Site Reliability Engineer - SRE at Descope Los Altos, CA

    Itlearn360 • Los Altos, CA, United States
    Full-time
    Site Reliability Engineer - SRE job at Descope.Descope R&D group is a skilled team of developers with a unique DNA of creativity,flexibility,anopen mindset. We are looking for a passionate SRE to jo...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Palo Alto Networks • Santa Clara, California, United States
    Full-time
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Id.me • Mountain View, California, United States
    Full-time
    Consumers can verify their identity with ID.Over 152 million users experience streamlined login and identity verification with ID. More than 600+ consumer brands use ID.Commerce Department and is ap...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Rockwoods Inc • Pleasanton, CA, United States
    Full-time
    Note : Candidates must have relevant experience in Medical / Healthcare domains, this is mandatory.Senior SRE Engineer - Pleasanton, 5 days office. Primary work : 24x7 On-call support and setting up mo...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer (SRE) - grok.com & API

    Site Reliability Engineer (SRE) - grok.com & API

    Xai • Palo Alto, California, United States
    Full-time
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show more
    Last updated: 30+ days ago • Promoted