Talent.com
Cloud Site Reliability Engineer (SRE)

Cloud Site Reliability Engineer (SRE)

PromiseOakland, CA, United States
2 days ago
Job type
  • Permanent
Job description

Cloud Site Reliability Engineer (SRE)

Promise empowers utilities and government agencies to create flexible, affordable solutions for individuals struggling with debt. Our innovative approach to payment plans and relief distribution significantly improves enrollment and recovery rates, helping individuals clear debts faster and reducing delinquencies for our partners.

We treat people facing financial difficulties with respect and dignity, providing the tools and resources they need to thrive. Our team includes experts from companies like Palantir, Google, Stripe, and esteemed government leaders.

Backed by over $50 million in funding from top investors such as 8VC, Kapor Capital, XYZ Ventures, and Howard Schultz, we've been recognized as one of Fast Company's "World's Most Innovative Companies of 2022."

We're looking for a Cloud Site Reliability Engineer (SRE) to build, operate, and optimize the infrastructure that powers our products. You'll be responsible for ensuring high reliability, performance, and scalability of our cloud-based systems. The ideal candidate is self-sufficient, detail-oriented, and execution-driven, with a strong background in software development, site reliability engineering (SRE), and infrastructure-as-code (IaC).

You'll collaborate closely with product and engineering teams to improve system architecture, troubleshoot issues, and automate operational processes. This role is ideal for someone who thrives in a hard-working, fast-moving environment, enjoys solving complex technical challenges, and takes personal responsibility for ensuring security outcomes are achieved and aligned to business goals.

What You'll Do

  • Design, implement, and manage cloud infrastructure to ensure reliability, scalability, and security.
  • Automate infrastructure and operations using Terraform, scripting, and configuration management tools.
  • Develop strong relationships with engineering teams to define system reliability goals and best practices.
  • Troubleshoot and resolve complex network and system issues using observability tools, stack traces, and system logs.
  • Monitor and optimize system performance, implementing best practices for high availability and disaster recovery.
  • Formalize and liaise with the Engineering team to guide them through a security design review process
  • Ensure the security and stability of Linux-based production systems.
  • Provide essential support in aligning our technology projects with compliance requirements, navigating the complexities of state and federal regulations, while fostering an environment of innovation.
  • Serve as a bridge between technical teams and non-technical stakeholders, translating security and compliance needs into actionable plans that support our broader business objectives.

What Will Enable You

  • 4+ years of experience in Linux system administration, managing large-scale production environments.
  • Strong debugging skills, with experience in performance tuning, observability, and system-level troubleshooting.
  • Hands-on experience with cloud platforms (AWS, Azure, or GCP).
  • Expertise in Infrastructure-as-Code (IaC) using Terraform or similar tools.
  • Proficiency in monitoring tools (e.g., Prometheus, Datadog) and health check implementation.
  • Experience with containerization (Docker, Podman, Kubernetes).
  • Scripting experience (Python, Bash, or equivalent) to automate infrastructure management.
  • Knowledge of networking and security best practices for cloud environments.
  • Promise is an equal opportunity employer and does not discriminate against any applicant or employee because of race, color, religion, sex, sexual orientation, gender identity, national origin, disability, genetic information, age, or military or veteran status. Additionally, the Company complies with applicable state and local laws governing non-discrimination in employment in every jurisdiction in which it operates. Promise is committed to promoting diversity and inclusion in the workplace. We also provide reasonable accommodations to qualified individuals with disabilities, pregnant individuals, and those with sincerely held religious beliefs, in accordance with applicable laws.

    Promise engages in US government contracts and restricts hiring to US persons, which includes US citizens and permanent residents (e.g., Green Card holders). Additionally, candidates must reside in the US.

    Create a job alert for this search

    Site Reliability Engineer Sre • Oakland, CA, United States

    Related jobs
    • Promoted
    Senior Site Reliability Engineer (SRE) - CloudVision as a Service (CVaaS)

    Senior Site Reliability Engineer (SRE) - CloudVision as a Service (CVaaS)

    Arista Networks, Inc.Santa Clara, CA, United States
    Full-time
    Arista Networks is an industry leader in data-driven, client-to-cloud networking for large data center, campus and routing environments. What sets us apart is our relentless pursuit of innovation.We...Show moreLast updated: 7 days ago
    • Promoted
    Senior Site Reliability Engineer Cloud Platform

    Senior Site Reliability Engineer Cloud Platform

    ZillizRedwood City, CA, United States
    Full-time
    Zilliz is a fast-growing startup developing the industry's leading vector database company for enterprise-grade AI.Founded by the engineers behind Milvus, the world's most popular open-source vecto...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer, BCM - DGX Cloud

    Senior Site Reliability Engineer, BCM - DGX Cloud

    NVIDIASanta Clara, CA, United States
    Full-time
    Senior Site Reliability Engineer, BCM - DGX Cloud page is loaded## Senior Site Reliability Engineer, BCM - DGX Cloudlocations : US, CA, Santa Clara : US, Remotetime type : Full timeposted on : Posted Y...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantumPalo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer (Cloud Infra)

    Senior Site Reliability Engineer (Cloud Infra)

    Mumba Technologies, Inc.Palo Alto, CA, US
    Full-time
    We are seeking a highly skilled.Senior Site Reliability Engineer.In this role responsibilities will include designing and implementing infrastructure automation, continuous integration and delivery...Show moreLast updated: 3 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Insight GlobalSanta Clara, CA, United States
    Full-time
    Insight Global is looking for a seasoned SRE to join one of our largest technology clients' multifaceted and fast-paced Infrastructure, Planning and Processes organization where you will be working...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer - Global SRE, Monetization Technology

    Site Reliability Engineer - Global SRE, Monetization Technology

    Tik TokSan Jose, CA, United States
    Full-time
    TikTok is one of the fastest growing apps in the world, and we're seeking Site Reliability Engineers (SREs) to join our monetization technology team. The monetization technology team works on buildi...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Together AISan Francisco, CA, United States
    Full-time
    As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a soft...Show moreLast updated: 30+ days ago
    • Promoted
    Senior / Staff Engineer - Reliability (SRE)

    Senior / Staff Engineer - Reliability (SRE)

    Pantera CapitalPalo Alto, CA, United States
    Full-time
    Perplexity is seeking a Site Reliability Engineer (SRE) to join our small team in revolutionizing the way people search and interact with the internet. You will be responsible for leading the design...Show moreLast updated: 3 days ago
    • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    BasetenSan Francisco, CA, United States
    Full-time
    Baseten powers inference for the world's most dynamic AI companies, like OpenEvidence, Clay, Mirage, Gamma, Sourcegraph, Writer, Abridge, Bland, and Zed. By uniting applied AI research, flexible inf...Show moreLast updated: 11 days ago
    • Promoted
    Site Reliability Engineer (SRE) - grok.com & API

    Site Reliability Engineer (SRE) - grok.com & API

    Pantera CapitalPalo Alto, CA, United States
    Full-time
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Principal Cloud Site Reliability Engineer, Actimize

    Principal Cloud Site Reliability Engineer, Actimize

    NICESanta Clara, CA, United States
    Full-time
    At NiCE, we don't limit our challenges.We set the highest standards and execute beyond them.And if you're like us, we can offer you the ultimate career opportunity that will light a fire within you...Show moreLast updated: 22 hours ago
    • Promoted
    Site Reliability Engineer (SRE) / DevOps Engineer

    Site Reliability Engineer (SRE) / DevOps Engineer

    Diverse LynxSunnyvale, CA, United States
    Full-time
    BS / MS in Computer Science or Equivalent • At least 8+ years in a Reliability Engineering, DevOps or infrastructure focused role • Advanced experience with programming languages (Python, Java) • Passio...Show moreLast updated: 7 days ago
    • Promoted
    Principal Site Reliability Engineer Cloud Identity & Trust SPIFFE / SPIRE

    Principal Site Reliability Engineer Cloud Identity & Trust SPIFFE / SPIRE

    ESR HealthcareSan Jose, CA, United States
    Full-time
    About the job Principal Site Reliability Engineer Cloud Identity & Trust SPIFFE / SPIRE.Experience level : Mid-senior Experience required : 10 Years Education level : Bachelors degree Job function : Info...Show moreLast updated: 7 days ago
    • Promoted
    Lead Site Reliability Engineer (SRE)

    Lead Site Reliability Engineer (SRE)

    EPAM Systems IncSan Jose, CA, United States
    Full-time
    At EPAM, we're not just building software - we're engineering excellence.Lead Site Reliability Engineer (SRE).This role is ideal for someone who thrives in fast-paced financial systems, has a passi...Show moreLast updated: 7 days ago
    • Promoted
    Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage

    Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage

    5 Star Global Recruitment PartnersSan Jose, CA, United States
    Full-time
    About the job Principal Site Reliability Engineer Cloud Identity & Trust - 2nd Stage.SPIFFE - Experience SPIRE - Experience Multiple Cloud Experience Kubernetes. Deep Knowledge base of Development I...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer - DGX Cloud

    Senior Site Reliability Engineer - DGX Cloud

    NVIDIASanta Clara, CA, United States
    Full-time
    Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Rockwoods IncPleasanton, CA, United States
    Full-time
    Note : Candidates must have relevant experience in Medical / Healthcare domains, this is mandatory.Senior SRE Engineer - Pleasanton, 5 days office. Primary work : 24x7 On-call support and setting up mo...Show moreLast updated: 26 days ago