Talent.com
No longer accepting applications
Site Reliability Engineer

Site Reliability Engineer

Berkley HuntSan Francisco, CA, United States
8 days ago
Job type
  • Full-time
Job description

Senior Site Reliability Engineer (GPU Compute) | Hybrid — Bay Area, CA

Berkley Hunt is supporting a fast-growing AI startup building a high-performance, cloud-native platform to power cutting-edge machine learning workloads. As they scale, they’re hiring a Senior / Staff Infrastructure Engineer to lead the development of a scalable GPU compute environment from the ground up.

About the Role

This is a high-impact role for an experienced infrastructure engineer who thrives in fast-paced environments and wants to shape the future of AI infrastructure. You’ll design, build, and operate the systems that enable high-throughput GPU workloads at scale—collaborating closely with the core engineering team to optimize performance, efficiency, and reliability.

If you’re excited about solving deep technical challenges in distributed compute and cloud automation, this could be a standout opportunity.

Responsibilities

  • Build and maintain a large-scale, distributed GPU compute platform powering AI workloads.
  • Develop backend systems in Python to orchestrate GPU jobs, manage routing, observability, and capacity.
  • Design and implement infrastructure with tools like Terraform, Ansible, and Kubernetes across cloud and bare metal environments.
  • Own the reliability, scalability, and performance of the platform, from provisioning to deployment and monitoring.
  • Collaborate with the engineering team to shape infrastructure vision and technical strategy over the next 1–5 years.
  • Drive automation and improvements to minimize operational overhead and scale efficiently.

Requirements

  • 6+ years of experience in cloud infrastructure or backend engineering roles.
  • Deep knowledge of distributed compute systems, especially involving GPU orchestration.
  • Proficiency with Python and infrastructure-as-code tools (e.g., Terraform, Ansible).
  • Solid experience with Kubernetes and CI / CD pipelines.
  • Strong understanding of cloud platforms (AWS, GCP, or Azure); bare metal experience is a plus.
  • Excellent problem-solving skills and a proactive, ownership-driven mindset.
  • Nice to Have

  • Experience at a high-growth startup or in scaling large infrastructure systems.
  • Familiarity with GPU resource scheduling and performance optimization.
  • Hands-on experience with observability stacks (Prometheus, Grafana, Loki, Thanos).
  • A passion for automation, infrastructure design, and moving fast without breaking things.
  • #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • San Francisco, CA, United States

    Related jobs
    • Promoted
    NOC Engineer

    NOC Engineer

    VirtualVocationsFremont, California, United States
    Full-time
    A company is looking for a NOC Engineer (Overnight).Key Responsibilities Troubleshoot and resolve network outages and service interruptions Utilize network monitoring tools and ticket tracking s...Show moreLast updated: 30+ days ago
    • Promoted
    NOC Engineer II

    NOC Engineer II

    VirtualVocationsHayward, California, United States
    Full-time
    A company is looking for a NOC Engineer II (Overnight).Key Responsibilities Troubleshoot and resolve network outages and service interruptions Utilize network monitoring tools and ticket trackin...Show moreLast updated: 30+ days ago
    • Promoted
    Technical Lead - React Native

    Technical Lead - React Native

    VirtualVocationsHayward, California, United States
    Full-time
    A company is looking for a Technical Lead (React Native / Node.Key Responsibilities Lead the development of mobile products and their backend services, focusing on feature architecture and perfor...Show moreLast updated: 1 day ago
    • Promoted
    • New!
    South Carolina Licensed Application Engineer

    South Carolina Licensed Application Engineer

    VirtualVocationsHayward, California, United States
    Full-time
    A company is looking for an Application Integration Engineer III to support its academic, research, and healthcare missions. Key Responsibilities Design and develop complex technical integrations ...Show moreLast updated: 8 hours ago
    • Promoted
    IGA Engineer with Secret Clearance

    IGA Engineer with Secret Clearance

    VirtualVocationsConcord, California, United States
    Full-time
    A company is looking for an IGA Engineer with Active Secret Clearance.Key Responsibilities Design and implement Identity Governance and Administration (IGA) solutions within federal organizations...Show moreLast updated: 2 days ago
    • Promoted
    Gameplay Systems Engineer

    Gameplay Systems Engineer

    VirtualVocationsFremont, California, United States
    Full-time
    A company is looking for a Destiny Gameplay Systems Engineer (Mid to Senior).Key Responsibilities Develop systemic gameplay features from design idea to implementation Extend and enhance the gam...Show moreLast updated: 1 day ago
    • Promoted
    • New!
    ServiceNow Integration Technical Lead

    ServiceNow Integration Technical Lead

    VirtualVocationsConcord, California, United States
    Full-time
    A company is looking for a ServiceNow Integration Technical Lead Integrations.Key Responsibilities Provide governance and technical guidance to scrum teams focusing on ServiceNow integrations De...Show moreLast updated: 22 hours ago
    • Promoted
    Oracle Applications Engineer

    Oracle Applications Engineer

    VirtualVocationsConcord, California, United States
    Full-time
    A company is looking for an Oracle Applications Engineer (Techno-Functional).Key Responsibilities Work across Oracle EBS 12. Accounts Payable Design, develop, and maintain RICE components Build ...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantumPalo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    Lead Professional Services Engineer

    Lead Professional Services Engineer

    VirtualVocationsConcord, California, United States
    Full-time
    A company is looking for a Lead Professional Services Engineer, CX.Key Responsibilities Review design and implementation of integration projects by peers Lead and participate in data integration...Show moreLast updated: 1 day ago
    • Promoted
    • New!
    Platform Engineer (Kubernetes)

    Platform Engineer (Kubernetes)

    VirtualVocationsHayward, California, United States
    Full-time
    A company is looking for a Platform Engineer (Kubernetes).Key Responsibilities Assist in developing best practices and tooling for Azure & Kubernetes usage across the organization Implement Azur...Show moreLast updated: 12 hours ago
    • Promoted
    Remote Engineering Leadership Roles

    Remote Engineering Leadership Roles

    VirtualVocationsHayward, California, United States
    Remote
    Full-time
    A company is looking for candidates for future opportunities in various leadership and technical roles for Canadians abroad. Key Responsibilities Lead and build teams in engineering, product desig...Show moreLast updated: 1 day ago
    • Promoted
    • New!
    Senior Infrastructure Engineer - Sports Betting Licensed

    Senior Infrastructure Engineer - Sports Betting Licensed

    VirtualVocationsConcord, California, United States
    Full-time
    A company is looking for a Senior Infrastructure Engineer - Data Platform.Key Responsibilities Build infrastructure solutions to enhance data systems and workflows Design, build, and maintain da...Show moreLast updated: 12 hours ago
    • Promoted
    Mid Site Reliability Engineer

    Mid Site Reliability Engineer

    VirtualVocationsSan Jose, California, United States
    Full-time
    A company is looking for a Mid Site Reliability Engineer to join their Infrastructure team.Key Responsibilities Design, implement, and maintain highly available AWS infrastructure supporting 1M+ ...Show moreLast updated: 1 day ago
    • Promoted
    • New!
    System Chargemaster Specialist II

    System Chargemaster Specialist II

    VirtualVocationsConcord, California, United States
    Full-time
    A company is looking for a System Chargemaster Specialist II.Key Responsibilities Maintain and update the chargemaster / fee schedule in accordance with departmental and regulatory changes Conduct...Show moreLast updated: 10 hours ago
    • Promoted
    • New!
    Instana Engineer

    Instana Engineer

    VirtualVocationsConcord, California, United States
    Full-time
    A company is looking for an Instana Engineer for a remote role.Key Responsibilities Identify training needs and collaborate with teams to develop solutions for customer requirements Establish ac...Show moreLast updated: 12 hours ago
    • Promoted
    Senior GenAI Technical Lead

    Senior GenAI Technical Lead

    VirtualVocationsConcord, California, United States
    Full-time
    A company is looking for a Senior GenAI Technical Lead, Partner Platforms.Key Responsibilities Drive technical integration of GenAI offerings with ISV and CSP platforms, defining objectives and a...Show moreLast updated: 2 days ago
    • Promoted
    AI Applications Engineer

    AI Applications Engineer

    VirtualVocationsFremont, California, United States
    Full-time
    A company is looking for an AI Application Engineer to bridge cutting-edge AI research and practical implementation.Key Responsibilities Identify and prioritize high-potential experiments based o...Show moreLast updated: 1 day ago
    • Promoted
    Senior Technical Architect

    Senior Technical Architect

    VirtualVocationsConcord, California, United States
    Full-time
    A company is looking for a Senior Technical Architect to guide cloud modernization initiatives and provide technical leadership for customer projects. Key Responsibilities Serve as the dedicated t...Show moreLast updated: 30+ days ago
    • Promoted
    ETL Implementation Engineer

    ETL Implementation Engineer

    VirtualVocationsFremont, California, United States
    Full-time
    A company is looking for an ETL Implementation Engineer to develop and maintain ETL pipelines and ensure data quality within financial services. Key Responsibilities Design, build, and maintain ro...Show moreLast updated: 1 day ago