Talent.com
Senior Site Reliability Engineer (Cloud Infra)
Senior Site Reliability Engineer (Cloud Infra)Mumba Technologies, Inc. • Palo Alto, CA, United States
Senior Site Reliability Engineer (Cloud Infra)

Senior Site Reliability Engineer (Cloud Infra)

Mumba Technologies, Inc. • Palo Alto, CA, United States
17 hours ago
Job type
  • Full-time
Job description

About the Role

We are seeking a highly skilled Senior Site Reliability Engineer to join our team. In this role responsibilities will include designing and implementing infrastructure automation, continuous integration and delivery pipelines, and monitoring and scaling the infrastructure that powers our healthcare AI platform. You will work closely with software engineers, research scientists, and other cross-functional teams to develop and maintain reliable and scalable infrastructure that enables rapid iteration and deployment of our products.

Key Responsibilities

  • Design and implement infrastructure automation and deployment pipelines using tools such as Terraform
  • Implement and maintain monitoring and logging systems to ensure the reliability and performance of our healthcare AI platform
  • Work closely with software engineers to design and deploy scalable, fault-tolerant, and secure production systems on cloud platforms such as AWS, GCP, or Azure
  • Develop and maintain security and compliance policies and procedures for our healthcare AI platform
  • Collaborate with cross-functional teams to troubleshoot and resolve complex issues related to infrastructure, deployment, and operations
  • Implement and maintain disaster recovery and business continuity plans
  • Develop and maintain documentation related to infrastructure, deployment, and operations
  • Mentor and provide technical guidance to junior engineers

Qualifications

  • Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field
  • At least 5 years of professional experience as SRE
  • Strong skills in building cloud infra orchestration systems (Operators) using python, Go
  • Expertise in infrastructure automation and deployment tools such as Terraform, or GitLab CI / CD
  • Experience with cloud platforms such as AWS, GCP, or Azure
  • Strong knowledge of containerization technologies such as Docker and Kubernetes
  • Experience with monitoring and logging tools such as ELK, Grafana, or Datadog
  • Familiarity with security and compliance best practices and tools such as HashiCorp Vault, AWS KMS, or Azure Key Vault
  • Strong problem-solving skills and ability to work independently and collaboratively in a team environment
  • Excellent communication and interpersonal skills
  • Experience implementing HIPAA and SOC2 compliance in a plus
  • Experience working in an HPC Environment is a plus
  • Create a job alert for this search

    Senior Site Reliability Engineer • Palo Alto, CA, United States

    Related jobs
    Cloud DevOps / Site Reliability Engineer

    Cloud DevOps / Site Reliability Engineer

    TEKsystems • Sunnyvale, CA, United States
    Full-time
    The Cloud DevOps / Site Reliability Engineer is responsible for automating, modernizing, and ensuring reliability of hybrid infrastructure spanning AWS and on-prem data center systems.This role man...Show more
    Last updated: 8 days ago • Promoted
    Remote Side Hustle Developer

    Remote Side Hustle Developer

    Finance Buzz • Felton, California, US
    Remote
    Full-time +1
    This position is for individuals who want to develop a side income stream while still working full time.You will test different small-scale remote opportunities, learn what works, and grow what pro...Show more
    Last updated: 30+ days ago • Promoted
    Implementation Engineer UCaaS & CCaaS

    Implementation Engineer UCaaS & CCaaS

    Packet Fusion • Pleasanton, CA, United States
    Full-time
    UCaaS / CCaaS Implementation Engineer.UCaaS / CCaaS Implementation Engineer.Zoom Phone, RingCentral, Zoom Contact Center, and RingCentral Contact Center. You will play a key role in designing, integrati...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Software Reliability Engineer

    Sr. Software Reliability Engineer

    Abbott • Pleasanton, CA, United States
    Full-time
    Abbott is a global healthcare leader that helps people live more fully at all stages of life.Our portfolio of life-changing technologies spans the spectrum of healthcare, with leading businesses an...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantum • Palo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer (Cloud Infra)

    Senior Site Reliability Engineer (Cloud Infra)

    Mumba Technologies, Inc. • Palo Alto, CA, United States
    Full-time
    We are seeking a highly skilled.Senior Site Reliability Engineer.In this role responsibilities will include designing and implementing infrastructure automation, continuous integration and delivery...Show more
    Last updated: 21 days ago • Promoted
    Site Reliability Engineer - Kubernetes Platform

    Site Reliability Engineer - Kubernetes Platform

    Pantera Capital • Palo Alto, CA, United States
    Full-time
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show more
    Last updated: 10 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Archetype AI • Palo Alto, CA, United States
    Full-time
    Get AI-powered advice on this job and more exclusive features.Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team f...Show more
    Last updated: 7 days ago • Promoted
    Site Reliability Engineer - Openstack

    Site Reliability Engineer - Openstack

    Fortinet • Sunnyvale, CA, United States
    Full-time
    Fortinet is recruiting a Site Reliability Engineer- OPENSTACK to join our FortiStack team.This team is responsible for the management, operation and continued development of our Openstack-based pri...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    OPPO • Palo Alto, CA, United States
    Full-time
    OPPO US Research Center is seeking a skilled and proactive.Site Reliability Engineer (SRE).In this role, you will be responsible for ensuring the stability, scalability, and performance of our appl...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer for San Leandro, CA (Onsite) - locals only

    Site Reliability Engineer for San Leandro, CA (Onsite) - locals only

    Key2Source • San Leandro, CA, United States
    Full-time
    Quick Apply
    Job Title : Site Reliability Engineer Duration : 12-month engagement.Location : San Leandro, CA (Onsite) - Need locals o...Show more
    Last updated: 7 days ago
    Senior Engineer Cloud Architecture

    Senior Engineer Cloud Architecture

    Tata Consultancy Services • Fremont, CA, United States
    Full-time
    Must Have Technical / Functional Skills.Net, C#, Java, AWS Roles & Responsibilities.Software Development : Design, develop, test, and deploy robust, scalable, and secure applications using C#,.Cloud A...Show more
    Last updated: 6 days ago • Promoted
    Senior Salesforce Engineer, Sales Cloud

    Senior Salesforce Engineer, Sales Cloud

    Vagaro Inc • Pleasanton, CA, United States
    Full-time
    Why Vagaro? At Vagaro, we believe in fostering a collaborative and inclusive work environment where every team member can thrive. Our culture is built on innovation, continuous learning, and a passi...Show more
    Last updated: 16 days ago • Promoted
    CLM Cloud Engineer

    CLM Cloud Engineer

    Info Way Solutions • Fremont, CA, United States
    Full-time
    We are seeking a highly skilled and motivated CLM (Contract Lifecycle Management) Cloud Engineer to join our team.As a CLM Cloud Engineer, you will play a crucial role in designing, implementing, a...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer - SRE at Descope Los Altos, CA

    Site Reliability Engineer - SRE at Descope Los Altos, CA

    Itlearn360 • Los Altos, CA, United States
    Full-time
    Site Reliability Engineer - SRE job at Descope.Descope R&D group is a skilled team of developers with a unique DNA of creativity,flexibility,anopen mindset. We are looking for a passionate SRE to jo...Show more
    Last updated: 30+ days ago • Promoted
    Senior Dev Operations Engineer

    Senior Dev Operations Engineer

    Forward Role Recruitment • Pleasanton, CA, United States
    Full-time
    Job Title : Senior Dev Operations Engineer - SRE (CR260).Experience setting up alerts / alarms / notifications in AWS cloud. Experience with AWS solutions using AWS services including Kafka, ECS, EKS...Show more
    Last updated: 13 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Rockwoods Inc • Pleasanton, CA, United States
    Full-time
    Note : Candidates must have relevant experience in Medical / Healthcare domains, this is mandatory.Senior SRE Engineer - Pleasanton, 5 days office. Primary work : 24x7 On-call support and setting up mo...Show more
    Last updated: 30+ days ago • Promoted
    Cloud DevOps / Site Reliability Engineer, Applied Machine Learning

    Cloud DevOps / Site Reliability Engineer, Applied Machine Learning

    Apple • Cupertino, CA, United States
    Full-time
    Cloud DevOps / Site Reliability Engineer, Applied Machine Learning.Apple’s Applied Machine Learning team has built systems for a number of large-scale data science applications.We work on many high...Show more
    Last updated: 30+ days ago • Promoted