Talent.com
Senior Site Reliability Engineer (Cloud Infra)

Senior Site Reliability Engineer (Cloud Infra)

Mumba Technologies, Inc.Palo Alto, CA, US
1 day ago
Job type
  • Full-time
Job description

Job Description

About the Role

We are seeking a highly skilled Senior Site Reliability Engineer to join our team. In this role responsibilities will include designing and implementing infrastructure automation, continuous integration and delivery pipelines, and monitoring and scaling the infrastructure that powers our healthcare AI platform. You will work closely with software engineers, research scientists, and other cross-functional teams to develop and maintain reliable and scalable infrastructure that enables rapid iteration and deployment of our products.

Key Responsibilities

  • Design and implement infrastructure automation and deployment pipelines using tools such as Terraform
  • Implement and maintain monitoring and logging systems to ensure the reliability and performance of our healthcare AI platform
  • Work closely with software engineers to design and deploy scalable, fault-tolerant, and secure production systems on cloud platforms such as AWS, GCP, or Azure
  • Develop and maintain security and compliance policies and procedures for our healthcare AI platform
  • Collaborate with cross-functional teams to troubleshoot and resolve complex issues related to infrastructure, deployment, and operations
  • Implement and maintain disaster recovery and business continuity plans
  • Develop and maintain documentation related to infrastructure, deployment, and operations
  • Mentor and provide technical guidance to junior engineers

Qualifications

  • Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field
  • At least 5 years of professional experience as SRE
  • Strong skills in building cloud infra orchestration systems (Operators) using python, Go
  • Expertise in infrastructure automation and deployment tools such as Terraform, or GitLab CI / CD
  • Experience with cloud platforms such as AWS, GCP, or Azure
  • Strong knowledge of containerization technologies such as Docker and Kubernetes
  • Experience with monitoring and logging tools such as ELK, Grafana, or Datadog
  • Familiarity with security and compliance best practices and tools such as HashiCorp Vault, AWS KMS, or Azure Key Vault
  • Strong problem-solving skills and ability to work independently and collaboratively in a team environment
  • Excellent communication and interpersonal skills
  • Experience implementing HIPAA and SOC2 compliance in a plus
  • Experience working in an HPC Environment is a plus
  • Create a job alert for this search

    Senior Site Reliability Engineer • Palo Alto, CA, US

    Related jobs
    • Promoted
    Senior Site Reliability Engineer - Apple Services Engineering (ASE) / iCloud

    Senior Site Reliability Engineer - Apple Services Engineering (ASE) / iCloud

    AppleCupertino, CA, United States
    Full-time
    Cupertino, California, United States Software and Services.People at Apple dont just build products they craft experiences our customers love and depend on. Apple Services Engineering (ASE) builds a...Show moreLast updated: 4 days ago
    • Promoted
    Senior Site Reliability Engineer (SRE) - CloudVision as a Service (CVaaS)

    Senior Site Reliability Engineer (SRE) - CloudVision as a Service (CVaaS)

    Arista Networks, Inc.Santa Clara, CA, United States
    Full-time
    Arista Networks is an industry leader in data-driven, client-to-cloud networking for large data center, campus and routing environments. What sets us apart is our relentless pursuit of innovation.We...Show moreLast updated: 4 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    NVIDIASanta Clara, CA, United States
    Full-time
    NVIDIA is looking for a Senior Site Reliability Engineer to work in IPP (Infrastructure, Planning and Process).IPP is a global organization within NVIDIA. This group works with various other groups ...Show moreLast updated: 4 days ago
    • Promoted
    Senior Site Reliability Engineer Cloud Platform

    Senior Site Reliability Engineer Cloud Platform

    ZillizRedwood City, CA, United States
    Full-time
    Zilliz is a fast-growing startup developing the industry's leading vector database company for enterprise-grade AI.Founded by the engineers behind Milvus, the world's most popular open-source vecto...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer, BCM - DGX Cloud

    Senior Site Reliability Engineer, BCM - DGX Cloud

    NVIDIASanta Clara, CA, United States
    Full-time
    Senior Site Reliability Engineer, BCM - DGX Cloud page is loaded## Senior Site Reliability Engineer, BCM - DGX Cloudlocations : US, CA, Santa Clara : US, Remotetime type : Full timeposted on : Posted Y...Show moreLast updated: 4 days ago
    • Promoted
    Senior Site Reliability Engineer - Observability and Telemetry Platform

    Senior Site Reliability Engineer - Observability and Telemetry Platform

    NVIDIASanta Clara, CA, United States
    Full-time
    Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantumPalo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer - Remote

    Site Reliability Engineer - Remote

    PayNearMeSanta Clara, CA, US
    Remote
    Full-time
    At PayNearMe, we’re on a mission to make paying and getting paid as simple as possible.We build innovative technology that transforms the way businesses and their customers experience payment...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Diverse LynxSan Francisco, CA, United States
    Full-time
    Role : Site Reliability Engineer.Location : RTP, NC / San Jose, CA (Onsite).SRE, NetApp Storage, Linux Certified, Kubernetes Certified, DevOps, Docker, etc. Experienced Senior SRE working on Kubernetes...Show moreLast updated: 4 days ago
    • Promoted
    • New!
    Senior Site Reliability Engineer (GCP / Kubernetes)

    Senior Site Reliability Engineer (GCP / Kubernetes)

    Hippocratic AIPalo Alto, CA, United States
    Full-time
    Hippocratic AI has developed a safety-focused Large Language Model (LLM) for healthcare.The company believes that a safe LLM can dramatically improve healthcare accessibility and health outcomes in...Show moreLast updated: 13 hours ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    LanceDBSan Francisco, CA, United States
    Full-time
    LanceDB is a developer-friendly, open-source data lake for multimodal AI.From hyper-scalable vector search to advanced retrieval for RAG, from streaming training data to interactive exploration of ...Show moreLast updated: 5 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Tarana WirelessMilpitas, CA, United States
    Full-time
    Join the Team That's Redefining Wireless Technology.At Tarana, we're more than just a fast-growing tech companywere a team of bold innovators on a mission to revolutionize broadband.Our groundbreak...Show moreLast updated: 4 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Signify TechnologyAtherton, CA, United States
    Full-time
    Senior Site Reliability Engineer.Competitive, based on experience.Join our innovative technology startup that is revolutionizing healthcare with a safety-focused AI platform.Our platform assists me...Show moreLast updated: 4 days ago
    • Promoted
    Senior Site Reliability Engineer - DGX Cloud

    Senior Site Reliability Engineer - DGX Cloud

    NVIDIASanta Clara, CA, United States
    Full-time
    Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of...Show moreLast updated: 4 days ago
    • Promoted
    Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage

    Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage

    5 Star Global Recruitment PartnersSan Jose, CA, United States
    Full-time
    About the job Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage.SPIFFE - Experience SPIRE - Experience Multiple Cloud Experience Kubernetes. Deep Knowledge base of Development I...Show moreLast updated: 30+ days ago
    • Promoted
    Cloud Native / Serverless Reliability Engineer (SRE)

    Cloud Native / Serverless Reliability Engineer (SRE)

    Alibaba CloudSunnyvale, CA, United States
    Full-time
    Cloud Native / Serverless Reliability Engineer (SRE).Cloud Native / Serverless Reliability Engineer (SRE).The Alibaba Cloud Cloud Native Serverless Team is a leading innovation force within Alibaba Clo...Show moreLast updated: 4 days ago
    • Promoted
    Senior Site Reliability Engineer - Apple Services Engineering (ASE)

    Senior Site Reliability Engineer - Apple Services Engineering (ASE)

    AppleCupertino, CA, United States
    Full-time
    Do you love engineering and running systems and infrastructure that will delight millions of customers? Imagine what you could do here. At Apple, new ideas have a way of becoming extraordinary produ...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Rockwoods IncPleasanton, CA, US
    Full-time
    Note : Candidates must have relevant experience in Medical / Healthcare domains, this is mandatory.Senior SRE Engineer - Pleasanton, 5 days office. Primary work : 24x7 On-call support and setting up mo...Show moreLast updated: 22 days ago