Talent.com
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

AI FundSan Francisco, CA, United States
30+ days ago
Job type
  • Full-time
Job description

About Baseten

Baseten powers inference for the world's most dynamic AI companies, like

THE ROLE

As a Site Reliability Engineer, you'll envision and build robust systems and processes that ensure our infrastructure is scalable, reliable, and efficient. This can range from automating deployments and monitoring systems to optimizing performance and managing incidents.

We all work closely with our users, learning from their past struggles in operationalizing ML, onboarding them onto our platform, and turning our learnings into ideas for improving Baseten.

EXAMPLE INITIATIVES

You'll get to work on these types of projects as part of our Infrastructure team :

Responsibilities

  • Build and maintain scalable infrastructure to support the deployment and operation of machine learning models.
  • Establish standards and best practices for reliability and performance across the infrastructure.
  • Automate processes when relevant, particularly for managing CI / CD pipelines.
  • Own products and projects end-to-end, functioning as both an engineer and a project manager, with a focus on user empathy, project specification, and end-to-end execution.
  • Collaborate with cross-functional teams to understand project requirements and translate them into technical solutions.
  • Mentor junior team members and contribute to knowledge sharing within the organization.
  • Navigate ambiguity and exercise good judgment on tradeoffs and tools needed to solve problems, avoiding unnecessary complexity.
  • Demonstrate pride, ownership, and accountability for your work, expecting the same from your teammates.

Requirements

  • Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field.
  • 3+ years of work professional work experience in a fast-paced, high-growth environment.
  • Extensive experience with Kubernetes.
  • Experience in building and maintaining scalable infrastructure.
  • Experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation, Pulumi) and CI / CD tooling (e.g., GitHub Actions, GitLab CI, Circle CI, Jenkins).
  • Relevant OSS observability experience (Prometheus, ELK stack, Grafana stack, Opentelemetry) is a plus.
  • Ability to own projects end-to-end, from project specification to execution.
  • No prior machine learning experience required, but should be open to learning about it.
  • Benefits

  • Competitive compensation package.
  • This is a unique opportunity to be part of a rapidly growing startup in one of the most exciting engineering fields of our era.
  • An inclusive and supportive work culture that fosters learning and growth.
  • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
  • Apply now to embark on a rewarding journey in shaping the future of AI! If you are a motivated individual with a passion for machine learning and a desire to be part of a collaborative and forward-thinking team, we would love to hear from you.

    At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status.

    #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer Sre • San Francisco, CA, United States

    Related jobs
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    VirtualVocationsSan Jose, California, United States
    Full-time
    A company is looking for a Site Reliability Engineer 1.Key Responsibilities Manage deployments of services to the GovCloud Monitor KPIs of services running in the GovCloud Author and maintain d...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Senior Engineer, Site Reliability

    Senior Engineer, Site Reliability

    VirtualVocationsSan Francisco, California, United States
    Full-time
    A company is looking for a Senior Engineer in Site Reliability Engineering for Digital Banking.Key Responsibilities Ensure the reliability, availability, and performance of applications in produc...Show moreLast updated: 14 hours ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood Materials, Inc.San Francisco, CA, United States
    Full-time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling—keeping critical minerals in circulation and driving the energy transition.Founded in 2...Show moreLast updated: 2 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    criteoPalo Alto, CA, United States
    Full-time
    At Criteo we face challenging problems in the IT industry at scale.Our data is large and our systems require speed and complexity handling. We have about 40 petabytes in Hadoop storage and respond t...Show moreLast updated: 2 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantumPalo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Senior Director of Site Reliability Engineering

    Senior Director of Site Reliability Engineering

    VirtualVocationsHayward, California, United States
    Full-time
    A company is looking for a Senior Director of Site Reliability Engineering.Key Responsibilities Develop and execute the SRE vision and strategy for reliability and performance goals Lead and men...Show moreLast updated: 12 hours ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Together AISan Francisco, CA, United States
    Full-time
    As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a soft...Show moreLast updated: 2 days ago
    • Promoted
    Principal Site Reliability Engineer

    Principal Site Reliability Engineer

    JPMorganChasePalo Alto, CA, United States
    Full-time
    Join a globally recognized financial organization and advance your profession to new heights by contributing to revolutionary projects. You've discovered the perfect environment to have a major impa...Show moreLast updated: 2 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ZipRecruiterBerkeley, CA, United States
    Full-time
    Job DescriptionJob Description.We are seeking a Site Reliability Engineer to join our Operations Group.This role plays a key part in advancing scientific discovery by supporting high-performance co...Show moreLast updated: 2 days ago
    • Promoted
    Site Reliability Engineer (SRE) - grok.com & API

    Site Reliability Engineer (SRE) - grok.com & API

    Pantera CapitalPalo Alto, CA, United States
    Full-time
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show moreLast updated: 6 days ago
    • Promoted
    Staff Site Reliability Engineer

    Staff Site Reliability Engineer

    VirtualVocationsSan Jose, California, United States
    Full-time
    A company is looking for a Staff Site Reliability Engineer.Key Responsibilities Define and drive the strategic direction for SRE practices and reliability engineering Architect and implement com...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    VirtualVocationsSan Francisco, California, United States
    Full-time
    A company is looking for a Senior Site Reliability Engineer (contractor).Key Responsibilities Design and manage infrastructure using Terraform and CloudFormation Define and maintain SLIs, SLOs, ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Air AppsSan Francisco, CA, United States
    Full-time
    At Air Apps, we believe in thinking bigger—and moving faster.We’re a family-founded company on a mission to create the world’s first AI-powered Personal & Entrepreneurial Resource Planner (PRP), an...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood MaterialsSan Francisco, CA, United States
    Full-time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling — keeping critical minerals in circulation and driving the energy transition.Founded in...Show moreLast updated: 2 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    xAIPalo Alto, CA, United States
    Full-time
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show moreLast updated: 2 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ConductorOneSan Francisco, CA, United States
    Full-time
    Shape the future of identity with the highest-caliber team.If you’re amazing at what you do and want to solve big challenges in identity and security, come on board. Identity is how companies are be...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Bits to AtomsSan Francisco, CA, United States
    Full-time
    Site Reliability Engineer (SRE).You’ll work at the intersection of infrastructure, AI / ML systems, and mission-critical physical operations. You’ll collaborate directly with engineering, AI, and oper...Show moreLast updated: 2 days ago
    • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    BasetenSan Francisco, CA, United States
    Full-time
    Site Reliability Engineer (SRE).Baseten powers inference for the world's most dynamic AI companies, like OpenEvidence, Clay, Mirage, Gamma, Sourcegraph, Writer, Abridge, Bland, and Zed.By uniting a...Show moreLast updated: 30+ days ago