Talent.com
Senior Site Reliability Engineer (Cloud Infra)

Senior Site Reliability Engineer (Cloud Infra)

Mumba Technologies, Inc.Palo Alto, CA, US
3 days ago
Job type
  • Full-time
Job description

About the Role

We are seeking a highly skilled Senior Site Reliability Engineer to join our team. In this role responsibilities will include designing and implementing infrastructure automation, continuous integration and delivery pipelines, and monitoring and scaling the infrastructure that powers our healthcare AI platform. You will work closely with software engineers, research scientists, and other cross-functional teams to develop and maintain reliable and scalable infrastructure that enables rapid iteration and deployment of our products.

Key Responsibilities

  • Design and implement infrastructure automation and deployment pipelines using tools such as Terraform
  • Implement and maintain monitoring and logging systems to ensure the reliability and performance of our healthcare AI platform
  • Work closely with software engineers to design and deploy scalable, fault-tolerant, and secure production systems on cloud platforms such as AWS, GCP, or Azure
  • Develop and maintain security and compliance policies and procedures for our healthcare AI platform
  • Collaborate with cross-functional teams to troubleshoot and resolve complex issues related to infrastructure, deployment, and operations
  • Implement and maintain disaster recovery and business continuity plans
  • Develop and maintain documentation related to infrastructure, deployment, and operations
  • Mentor and provide technical guidance to junior engineers

Qualifications

  • Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field
  • At least 5 years of professional experience as SRE
  • Strong skills in building cloud infra orchestration systems (Operators) using python, Go
  • Expertise in infrastructure automation and deployment tools such as Terraform, or GitLab CI / CD
  • Experience with cloud platforms such as AWS, GCP, or Azure
  • Strong knowledge of containerization technologies such as Docker and Kubernetes
  • Experience with monitoring and logging tools such as ELK, Grafana, or Datadog
  • Familiarity with security and compliance best practices and tools such as HashiCorp Vault, AWS KMS, or Azure Key Vault
  • Strong problem-solving skills and ability to work independently and collaboratively in a team environment
  • Excellent communication and interpersonal skills
  • Experience implementing HIPAA and SOC2 compliance in a plus
  • Experience working in an HPC Environment is a plus
  • Create a job alert for this search

    Senior Site Reliability Engineer • Palo Alto, CA, US

    Related jobs
    • Promoted
    Senior Site Reliability Engineer - Apple Services Engineering (ASE) / iCloud

    Senior Site Reliability Engineer - Apple Services Engineering (ASE) / iCloud

    AppleCupertino, CA, United States
    Full-time
    Cupertino, California, United States Software and Services.People at Apple dont just build products they craft experiences our customers love and depend on. Apple Services Engineering (ASE) builds a...Show moreLast updated: 7 days ago
    • Promoted
    Senior Site Reliability Engineer (SRE) - CloudVision as a Service (CVaaS)

    Senior Site Reliability Engineer (SRE) - CloudVision as a Service (CVaaS)

    Arista Networks, Inc.Santa Clara, CA, United States
    Full-time
    Arista Networks is an industry leader in data-driven, client-to-cloud networking for large data center, campus and routing environments. What sets us apart is our relentless pursuit of innovation.We...Show moreLast updated: 7 days ago
    • Promoted
    Senior Site Reliability Engineer - Managed Kubernetes

    Senior Site Reliability Engineer - Managed Kubernetes

    LambdaSan Francisco, CA, United States
    Full-time
    Senior Site Reliability Engineer - Managed Kubernetes.Lambda, The Superintelligence Cloud, builds Gigawatt-scale AI Factories for Training and Inference. Lambdas mission is to make compute as ubiqui...Show moreLast updated: 7 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    NVIDIASanta Clara, CA, United States
    Full-time
    NVIDIA is looking for a Senior Site Reliability Engineer to work in IPP (Infrastructure, Planning and Process).IPP is a global organization within NVIDIA. This group works with various other groups ...Show moreLast updated: 7 days ago
    • Promoted
    Cloud Site Reliability Engineer (SRE)

    Cloud Site Reliability Engineer (SRE)

    PromiseOakland, CA, United States
    Permanent
    Cloud Site Reliability Engineer (SRE).Promise empowers utilities and government agencies to create flexible, affordable solutions for individuals struggling with debt. Our innovative approach to pay...Show moreLast updated: 2 days ago
    • Promoted
    Senior Site Reliability Engineer Cloud Platform

    Senior Site Reliability Engineer Cloud Platform

    ZillizRedwood City, CA, United States
    Full-time
    Zilliz is a fast-growing startup developing the industry's leading vector database company for enterprise-grade AI.Founded by the engineers behind Milvus, the world's most popular open-source vecto...Show moreLast updated: 30+ days ago
    • Promoted
    Sr. Site Reliability Engineer - ASE / iCloud Edge

    Sr. Site Reliability Engineer - ASE / iCloud Edge

    AppleCupertino, CA, United States
    Full-time
    The Apple Service Engineering - iCloud Edge SRE team is looking for Site Reliability Engineers to build and run the services that hundreds of millions of customers use every day.This team provides ...Show moreLast updated: 7 days ago
    • Promoted
    Senior Site Reliability Engineer, BCM - DGX Cloud

    Senior Site Reliability Engineer, BCM - DGX Cloud

    NVIDIASanta Clara, CA, United States
    Full-time
    Senior Site Reliability Engineer, BCM - DGX Cloud page is loaded## Senior Site Reliability Engineer, BCM - DGX Cloudlocations : US, CA, Santa Clara : US, Remotetime type : Full timeposted on : Posted Y...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer San Francisco

    Site Reliability Engineer San Francisco

    Perplexity AISan Francisco, CA, United States
    Full-time
    Site Reliability Engineer (SRE).Perplexity is seeking a Site Reliability Engineer (SRE) to join our small team in revolutionizing the way people search and interact with the internet.You will be re...Show moreLast updated: 1 day ago
    • Promoted
    Senior Site Reliability Engineer, Compute

    Senior Site Reliability Engineer, Compute

    CrusoeSan Francisco, CA, United States
    Full-time
    Crusoe's mission is to accelerate the abundance of energy and intelligence.We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, spe...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantumPalo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    LanceDBSan Francisco, CA, United States
    Full-time
    LanceDB is a developer-friendly, open-source data lake for multimodal AI.From hyper-scalable vector search to advanced retrieval for RAG, from streaming training data to interactive exploration of ...Show moreLast updated: 7 days ago
    • Promoted
    • New!
    Principal Cloud Site Reliability Engineer, Actimize

    Principal Cloud Site Reliability Engineer, Actimize

    NICESanta Clara, CA, United States
    Full-time
    At NiCE, we don't limit our challenges.We set the highest standards and execute beyond them.And if you're like us, we can offer you the ultimate career opportunity that will light a fire within you...Show moreLast updated: 22 hours ago
    • Promoted
    Site Reliability Engineer - Managed Kubernetes (Senior)

    Site Reliability Engineer - Managed Kubernetes (Senior)

    LambdaSan Francisco, CA, United States
    Full-time
    We're here to help the smartest minds on the planet build Superintelligence.The labs pushing the edge? They run on Lambda. Our gear trains and serves their models, our infrastructure scales with the...Show moreLast updated: 1 day ago
    • Promoted
    Senior Site Reliability Engineer - DGX Cloud

    Senior Site Reliability Engineer - DGX Cloud

    NVIDIASanta Clara, CA, United States
    Full-time
    Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of...Show moreLast updated: 7 days ago
    • Promoted
    Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage

    Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage

    5 Star Global Recruitment PartnersSan Jose, CA, United States
    Full-time
    About the job Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage.SPIFFE - Experience SPIRE - Experience Multiple Cloud Experience Kubernetes. Deep Knowledge base of Development I...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    AppOmniSan Francisco, CA, United States
    Full-time
    AppOmni, a leader in SaaS Security, helps customers achieve secure productivity with their applications.Security teams and owners can quickly detect and mitigate threats using unmatched depth of pr...Show moreLast updated: 2 days ago
    • Promoted
    Senior Site Reliability Engineer - Apple Services Engineering (ASE)

    Senior Site Reliability Engineer - Apple Services Engineering (ASE)

    AppleCupertino, CA, United States
    Full-time
    Do you love engineering and running systems and infrastructure that will delight millions of customers? Imagine what you could do here. At Apple, new ideas have a way of becoming extraordinary produ...Show moreLast updated: 7 days ago