Talent.com
Site Reliability Engineer
Site Reliability EngineerCompunnel • Richmond, CA, United States
Site Reliability Engineer

Site Reliability Engineer

Compunnel • Richmond, CA, United States
1 day ago
Job type
  • Full-time
Job description

The Site Reliability Engineer will be responsible for ensuring the reliability, availability, and performance of applications and services as part of the transition from private to public cloud. This role will involve driving the development of reliability engineering practices, automating processes, and enhancing system resilience to support our digital transformation journey in the competitive digital banking landscape.

Key Responsibilities :

  • Strategize and lead the transition from private to public cloud with a focus on reliability engineering.
  • Ensure high availability, performance, and minimal downtime for applications and services.
  • Lead incident response efforts, including triage, resolution, and post-incident analysis.
  • Develop and maintain monitoring solutions, alerting mechanisms, and proactive issue detection.
  • Implement automation tools to streamline routine tasks and ensure seamless deployments and rollbacks.
  • Collaborate with development and operations teams for capacity planning, performance tuning, and scalability.
  • Work with security teams to implement best practices and ensure compliance with regulatory requirements.
  • Manage deployment pipelines, release processes, and configuration management for app services.
  • Perform data analysis and trend analysis to identify areas for improvement in system reliability and operational efficiency.
  • Maintain and document operational procedures, troubleshooting guides, and best practices.
  • Develop and test disaster recovery plans and backup strategies to ensure business continuity.
  • Collaborate with cross-functional teams to align on reliability goals and incident response processes.
  • Participate in on-call rotations and provide 24 / 7 support for critical incidents.

Required Qualifications :

  • Proven experience in cloud reliability engineering, ideally in a public cloud environment (AWS, Azure).
  • Strong knowledge of incident response, root cause analysis, and system resilience practices.
  • Experience with automation tools for scaling infrastructure, deploying updates, and ensuring system reliability.
  • Familiarity with monitoring tools (Dynatrace, Splunk, etc.) and ability to create dashboards and alerts.
  • Excellent communication skills to collaborate effectively with cross-functional teams.
  • Experience with security best practices, vulnerability assessments, and regulatory compliance in cloud environments.
  • Ability to work in a fast-paced, high-pressure environment while maintaining a positive attitude.
  • Experience with infrastructure modernization, cloud migrations, or microservices architecture.
  • Preferred Qualifications :

  • Experience with open telemetry collectors and monitoring platforms like Prometheus.
  • Familiarity with DevOps tools and deployment pipelines.
  • Knowledge of disaster recovery planning and business continuity strategies.
  • Familiarity with GitHub-based infrastructure management and automation.
  • Certifications :

    Relevant cloud certifications (AWS, Azure, or Google Cloud) are a plus.

    #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • Richmond, CA, United States

    Related jobs
    Site Reliability Engineer

    Site Reliability Engineer

    Together AI • San Francisco, CA, United States
    Full-time
    As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a soft...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    Prosper • San Francisco, CA, United States
    Full-time
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...Show more
    Last updated: 6 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Bits to Atoms • San Francisco, CA, United States
    Full-time
    Site Reliability Engineer (SRE).You’ll work at the intersection of infrastructure, AI / ML systems, and mission-critical physical operations. You’ll collaborate directly with engineering, AI, and oper...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Rethink recruit • San Francisco, CA, United States
    Full-time
    Runloop is building the foundational infrastructure for the next generation of AI development.We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxe...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Runloop AI, Inc • San Francisco, CA, United States
    Full-time
    Runloop is building the foundational infrastructure for the next generation of AI development.We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxe...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood Materials, Inc. • San Francisco, CA, United States
    Full-time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling—keeping critical minerals in circulation and driving the energy transition.Founded in 2...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood Materials • San Francisco, CA, United States
    Full-time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling.We are seeking a highly skilled and motivated Site Reliability Engineer to collect requ...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Runloop AI • San Francisco, CA, United States
    Full-time
    Runloop is building the foundational infrastructure for the next generation of AI development.We provide AI engineers and data scientists with lightning-fast, secure, and reproducible code sandboxe...Show more
    Last updated: 10 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    WorkOS • San Francisco, CA, United States
    Full-time
    About WorkOS 🚀 WorkOS builds tools and services for developers to help them implement authentication, identity, authorization, and overall enterprise readiness. We’re a fully distributed team with ...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ConductorOne • San Francisco, CA, United States
    Full-time
    ConductorOne is the first AI-native identity security platform that protects every identity : human, non-human, and AI.With powerful automation, platform-level AI, and out-of-the-box connectors, it ...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Alchemy • San Francisco, CA, United States
    Full-time
    Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers th...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Fractal • San Francisco, CA, United States
    Full-time
    This range is provided by Fractal.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Fractal Analytics is a strategic AI partner to Fortune 500 com...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Replit • Foster City, CA, United States
    Full-time
    Replit is the agentic software creation platform that enables anyone to build applications using natural language.With millions of users worldwide and over 500,000 business users, Replit is democra...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Primer • San Francisco, CA, United States
    Full-time
    Primer helps B2B products break out of the B2C-centric marketing box.Our platform turns consumer ad channels, data streams, and emerging AI workflows into measurable growth engines for go-to-market...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Flexton, Inc. • San Francisco, CA, United States
    Full-time
    Skill : You have excellent written and verbal communication skills.You have experience managing large websites or services within the context of a large scale web environment.You are able to execute...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    Prosper.com • San Francisco, CA, United States
    Full-time
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...Show more
    Last updated: 2 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    P2P • San Francisco, CA, United States
    Full-time
    Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers th...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Rockwoods Inc • Pleasanton, CA, US
    Full-time
    Note : Candidates must have relevant experience in Medical / Healthcare domains, this is mandatory.Senior SRE Engineer - Pleasanton, 5 days office. Primary work : 24x7 On-call support and setting up mo...Show more
    Last updated: 20 days ago • Promoted