Talent.com
No longer accepting applications
Site Reliability Engineer

Site Reliability Engineer

Bits to AtomsSan Francisco, CA, United States
4 days ago
Job type
  • Full-time
Job description

Join to apply for the Site Reliability Engineer role at Bits to Atoms

The Site Reliability Engineer (SRE) will ensure the reliability, scalability, and performance of a hybrid (cloud + on-prem) platform. You’ll work at the intersection of infrastructure, AI / ML systems, and mission-critical physical operations. You’ll collaborate directly with engineering, AI, and operations teams to design resilient systems and bring cutting-edge AI models into production. This is a high-impact role — your work will directly shape how the world’s most advanced data centers operate.

Company Overview

Bits to Atoms has partnered with Fluix AI to fill its Site Reliability Engineer role. Fluix is building the AI operating system that plans, designs, and optimizes AI infrastructure. Based in Silicon Valley, Fluix specializes in AI-driven solutions for data centers and power providers, leveraging machine learning and AI technologies. Our mission is bold : to double America’s compute capacity without building new data centers.

Position Overview

The Site Reliability Engineer will ensure reliability, scalability, and performance across cloud and on-prem environments. You’ll work at the intersection of infrastructure, AI / ML systems, and mission-critical physical operations. You’ll collaborate with engineering, AI, and operations teams to design resilient systems and deploy AI models into production.

Who You’ll Work With

  • Chase Overcash – CTO

Responsibilities

  • Design, implement, and maintain scalable, fault-tolerant infrastructure across cloud and on-prem environments.
  • Build automation to streamline operations, reduce toil, and increase reliability.
  • Integrate ML / AI models into production systems and optimize their performance at scale.
  • Improve system resilience through monitoring, observability, and incident management.
  • Lead post-incident reviews, drive root-cause analysis, and implement preventative fixes.
  • Manage multi-environment cloud setups (dev, staging, prod) and optimize data center operations.
  • Ensure compliance and security across all infrastructure and applications.
  • Partner with engineering and data science teams to continuously improve deployment practices.
  • Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
  • Proven experience as an SRE, DevOps engineer, or similar role in a SaaS or infrastructure-heavy environment.
  • Strong expertise with cloud platforms (AWS preferred; GCP / Azure also valuable).
  • Proficiency in Python or similar scripting / programming languages.
  • Hands-on experience with containerization and orchestration (Kubernetes).
  • Solid understanding of networking, security, and performance optimization.
  • Familiarity with ML / AI infrastructure and data center operations is a strong plus.
  • Experience with CI / CD pipelines and infrastructure-as-code (Terraform, Ansible, etc.).
  • Excellent problem-solving skills and the ability to thrive in a fast-paced startup environment.
  • Culture Fit

  • Obsessed with solving hard problems and willing to dig deep.
  • Hands-on, comfortable with both physical and software systems.
  • Value being on-site and with clients, understanding impact of mission-critical work.
  • Embrace flexibility — supporting teammates during weekends, holidays, or emergencies when needed.
  • Over-communicate, collaborate openly, and take ownership.
  • Why Fluix?

  • Competitive salary and equity package.
  • Comprehensive health, dental, and vision insurance.
  • Opportunities to shape the future of AI infrastructure and data center technology.
  • A collaborative, fast-paced environment in the San Francisco Bay Area.
  • Referrals increase your chances of interviewing at Bits to Atoms.

    Get notified about new Site Reliability Engineer jobs in San Francisco Bay Area .

    #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • San Francisco, CA, United States

    Related jobs
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    criteoPalo Alto, CA, United States
    Full-time
    At Criteo we face challenging problems in the IT industry at scale.Our data is large and our systems require speed and complexity handling. We have about 40 petabytes in Hadoop storage and respond t...Show moreLast updated: 2 hours ago
    • Promoted
    • New!
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    LiveRampSan Francisco, CA, United States
    Full-time
    Join to apply for the Senior Site Reliability Engineer role at LiveRamp.LiveRamp is the data collaboration platform of choice for the world’s most innovative companies. A groundbreaking leader in co...Show moreLast updated: 2 hours ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantumPalo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    VirtualVocationsHayward, California, United States
    Full-time
    A company is looking for a Mid-Sr.Site Reliability Engineer with a focus on on-prem Kubernetes / K8s.Key Responsibilities Manage and maintain on-premise containerized environments Deploy resources...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    Together AISan Francisco, CA, United States
    Full-time
    As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a soft...Show moreLast updated: 2 hours ago
    • Promoted
    • New!
    Principal Site Reliability Engineer

    Principal Site Reliability Engineer

    JPMorganChasePalo Alto, CA, United States
    Full-time
    Join a globally recognized financial organization and advance your profession to new heights by contributing to revolutionary projects. You've discovered the perfect environment to have a major impa...Show moreLast updated: 2 hours ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    ZipRecruiterBerkeley, CA, United States
    Full-time
    Job DescriptionJob Description.We are seeking a Site Reliability Engineer to join our Operations Group.This role plays a key part in advancing scientific discovery by supporting high-performance co...Show moreLast updated: 2 hours ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PerplexitySan Francisco, CA, United States
    Full-time
    Perplexity is an AI-powered answer engine founded in December 2022 and growing rapidly as one of the world’s leading AI platforms. Perplexity has raised over $1B in venture investment from some of t...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    AlchemySan Francisco, CA, United States
    Full-time
    Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers th...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    VirtualVocationsSan Francisco, California, United States
    Full-time
    A company is looking for a Senior Site Reliability Engineer.Key Responsibilities Design and implement infrastructure and automation scripts for AWS deployment and management Optimize and monitor...Show moreLast updated: 30+ days ago
    • Promoted
    Customer Reliability Engineer

    Customer Reliability Engineer

    VirtualVocationsFremont, California, United States
    Full-time
    A company is looking for a Customer Reliability Engineer III.Key Responsibilities Manage and resolve customer technical issues via support tickets and real-time interactions Act as a liaison bet...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood MaterialsSan Francisco, CA, United States
    Full-time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling — keeping critical minerals in circulation and driving the energy transition.Founded in...Show moreLast updated: 2 hours ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    xAIPalo Alto, CA, United States
    Full-time
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show moreLast updated: 2 hours ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PrimerSan Francisco, CA, United States
    Full-time
    Primer helps B2B products break out of the B2C-centric marketing box.Our platform turns consumer ad channels, data streams, and emerging AI workflows into measurable growth engines for go-to-market...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    Jobs via DiceRedwood City, CA, United States
    Full-time
    Dice is the leading career destination for tech experts at every stage of their careers.Our client, Kforce Technology Staffing, is seeking a Reliability Engineer in Redwood City, CA.Deliver high-le...Show moreLast updated: 2 hours ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    HiveSan Francisco, CA, United States
    Full-time
    Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.The company...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer II

    Site Reliability Engineer II

    Hinge HealthSan Francisco, CA, United States
    Full-time
    From scaling Kubernetes clusters to improving observability with Datadog, we build the tooling and automation that empower product teams to ship with confidence. Collaborate with engineering teams t...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    ZapierSan Francisco, CA, United States
    Full-time
    We're humans who simply think computers should do more work.At Zapier, we’re not just making software—we’re building a platform to help millions of businesses globally scale with automation and AI....Show moreLast updated: 2 hours ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    Bits to AtomsSan Francisco, CA, United States
    Full-time
    Site Reliability Engineer (SRE).You’ll work at the intersection of infrastructure, AI / ML systems, and mission-critical physical operations. You’ll collaborate directly with engineering, AI, and oper...Show moreLast updated: 2 hours ago
    • Promoted
    Associate Site Reliability Engineer / Site Reliability Engineer

    Associate Site Reliability Engineer / Site Reliability Engineer

    MedStar HealthRedwood City, CA, United States
    Full-time
    C3 AI (NYSE : AI), is the Enterprise AI application software company.C3 AI delivers a family of fully integrated products including the C3 Agentic AI Platform, an end-to-end platform for developing,...Show moreLast updated: 30+ days ago