Talent.com
Site Reliability Engineer - Inference
Site Reliability Engineer - InferenceJobright.ai • San Francisco, CA, United States
Site Reliability Engineer - Inference

Site Reliability Engineer - Inference

Jobright.ai • San Francisco, CA, United States
30+ days ago
Job type
  • Full-time
Job description

Join to apply for the Site Reliability Engineer - Inference role at Jobright.ai

2 days ago Be among the first 25 applicants

Join to apply for the Site Reliability Engineer - Inference role at Jobright.ai

Get AI-powered advice on this job and more exclusive features.

Jobright is an AI-powered career platform that helps job seekers discover the top opportunities in the US. We are NOT a staffing agency. Jobright does not hire directly for these positions. We connect you with verified openings from employers you can trust.

Job Summary :

Lambda is the #1 GPU Cloud for ML / AI teams, providing tools for building, testing, and deploying AI products at scale. The Site Reliability Engineer - Inference will work on developing a large-scale platform for running AI models and building a high-throughput, low-latency API for distributed systems.

Responsibilities :

  • Work on our Inference service, helping us to develop our large-scale platform for running new, cutting-edge models across tens of thousands of GPUs
  • Help build a high-throughput, low-latency API and routing system running at geographically-distributed scale
  • Shape a highly reliable distributed system with a focus on reducing operational overhead and deep observability and capacity management.
  • Work with the team and our internal ML researchers to adopt and improve new inference engines, models and architectures across a variety of different mediums (such as text, image, video and audio)
  • Tackle global networking challenges to deliver the lowest possible latency to our users across all of Lambda’s available capacity
  • Help push Lambda forward into the state of the art, and be part of a team that is operating right at the edge of new developments in the industry.

Qualifications : Required :

  • 8 or more years of experience as a software reliability engineer or software engineer working on large-scale, internet-facing production services
  • Highly skilled at writing Go and Python
  • Experience with bare-metal system installation and administration
  • Experience deploying applications and operators on Kubernetes
  • Product-focused, balancing operational needs and keeping overheads down with the need to ship features at a rapid pace
  • Proven track record of working in an environment with rapid deployment and the ability to stay on top of shifting priorities as the industry rapidly develops
  • Willingness to take ownership of projects and help drive them forwards through design, implementation, launch, and maintenance.
  • Preferred :

  • Experience working with machine learning models
  • Experience operating large-scale, geographically distributed systems
  • Experience developing Kubernetes operators and components
  • Company :

    Lambda provides infrastructure, cloud services, and software for the training and inferencing of AI models. Founded in 2012, headquartered in San Jose, California, USA, team size 201-500 employees, currently Late Stage. Lambda has a track record of offering H1B sponsorships.

    Seniority level

    Seniority level

    Mid-Senior level

    Employment type

    Employment type

    Full-time

    Job function

    Industries

    Software Development

    Referrals increase your chances of interviewing at Jobright.ai by 2x

    Inferred from the description for this job

    Medical insurance

    Vision insurance

    401(k)

    Get notified when a new job is posted.

    Sign in to set job alerts for “Site Reliability Engineer” roles.

    San Francisco, CA $160,000.00-$180,000.00 4 days ago

    Software Engineer, Infrastructure, Early Career

    San Francisco, CA $126,000.00-$170,000.00 11 hours ago

    San Francisco, CA $180,000.00-$280,000.00 3 days ago

    San Francisco, CA $130,000.00-$238,000.00 1 day ago

    San Francisco, CA $150,000.00-$250,000.00 1 day ago

    San Francisco, CA $150,000.00-$230,000.00 4 months ago

    San Francisco, CA $99,500.00-$200,000.00 2 weeks ago

    Full-Stack Software Engineer (Jr / Mid level)

    San Francisco, CA $120,000.00-$180,000.00 1 day ago

    San Francisco, CA $56.25-$137,000.00 5 days ago

    Software Development Engineer I - Frontend & Mobile

    San Francisco, CA $99,500.00-$200,000.00 3 weeks ago

    San Francisco, CA $160,000.00-$200,000.00 2 months ago

    San Francisco, CA $150,000.00-$176,000.00 3 months ago

    San Francisco, CA $120,000.00-$190,000.00 9 months ago

    San Francisco, CA $130,000.00-$140,000.00 2 weeks ago

    Software Engineer, AI Intern (Summer 2026)

    San Francisco, CA $125,000.00-$175,000.00 2 months ago

    Software Engineer, AI Intern (Winter 2026)

    San Francisco, CA $130,000.00-$240,000.00 2 weeks ago

    San Francisco, CA $163,200.00-$223,200.00 3 days ago

    Software Engineer, Frontend (All Levels)

    San Francisco, CA $150,000.00-$220,000.00 2 weeks ago

    San Francisco, CA $150,000.00-$283,000.00 4 days ago

    San Francisco, CA $155,000.00-$339,500.00 2 weeks ago

    San Francisco, CA $140,000.00-$280,000.00 8 months ago

    San Francisco, CA $165,000.00-$165,000.00 2 years ago

    San Francisco, CA $120,000.00-$200,000.00 2 years ago

    We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

    #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • San Francisco, CA, United States

    Related jobs
    Site Reliability Engineer Team Lead

    Site Reliability Engineer Team Lead

    VirtualVocations • Santa Clara, California, United States
    Full-time
    A company is looking for a Site Reliability Engineer, Team Lead.Key Responsibilities Ensure 24x7 availability of production application systems Drive initiatives to improve operational efficienc...Show more
    Last updated: 1 day ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ConductorOne • San Francisco, CA, United States
    Full-time
    Shape the future of identity with the highest-caliber team.If you’re amazing at what you do and want to solve big challenges in identity and security, come on board. Identity is how companies are be...Show more
    Last updated: 10 days ago • Promoted
    Site Reliability Engineer Lead

    Site Reliability Engineer Lead

    VirtualVocations • Oakland, California, United States
    Full-time
    A company is looking for a Site Reliability Engineer, Team Lead.Key Responsibilities Ensure 24x7 availability of production application systems and drive operational efficiency initiatives Ident...Show more
    Last updated: 1 day ago • Promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    prosper.com • San Francisco, CA, United States
    Full-time
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...Show more
    Last updated: 5 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Latent • San Francisco, CA, United States
    Full-time
    Latent is building the intelligence infrastructure for American healthcare.Our products are already helping hospitals and clinics dramatically increase workflow output, speed up patient access to m...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Bits to Atoms • San Francisco, CA, United States
    Full-time
    Site Reliability Engineer (SRE).You’ll work at the intersection of infrastructure, AI / ML systems, and mission-critical physical operations. You’ll collaborate directly with engineering, AI, and oper...Show more
    Last updated: 25 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantum • Palo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood Materials, Inc. • San Francisco, CA, United States
    Full-time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling—keeping critical minerals in circulation and driving the energy transition.Founded in 2...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer II

    Site Reliability Engineer II

    VirtualVocations • Fremont, California, United States
    Full-time
    A company is looking for a Site Reliability Engineer II- Process Automation.Key Responsibilities Optimize and automate incident and change management processes to enhance system efficiency and re...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    VirtualVocations • Fremont, California, United States
    Full-time
    A company is looking for a Senior Site Reliability Engineer.Key Responsibilities Design, develop, and implement software to enhance system availability, scalability, latency, and efficiency Lead...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    WorkOS • San Francisco, CA, United States
    Full-time
    About WorkOS 🚀 WorkOS builds tools and services for developers to help them implement authentication, identity, authorization, and overall enterprise readiness. We’re a fully distributed team with ...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    Prosper Marketplace • San Francisco, CA, United States
    Full-time
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...Show more
    Last updated: 1 day ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood Materials • San Francisco, CA, United States
    Full-time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling — keeping critical minerals in circulation and driving the energy transition.Founded in...Show more
    Last updated: 28 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Fractal • San Francisco, CA, United States
    Full-time
    This range is provided by Fractal.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Fractal Analytics is a strategic AI partner to Fortune 500 com...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Writemed • San Francisco, CA, United States
    Full-time
    Would you like to join one of the fastest-growing organizations with a goal of using the latest AI, GenAI, LLM, Cloud, and Digital Technologies to advance drug development and improve patient care ...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Primer • San Francisco, CA, United States
    Full-time
    Primer helps B2B products break out of the B2C-centric marketing box.Our platform turns consumer ad channels, data streams, and emerging AI workflows into measurable growth engines for go-to-market...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    VirtualVocations • Fremont, California, United States
    Full-time
    A company is looking for a Site Reliability Engineer to provide engineering and operational support for cloud and application services in Oracle Cloud Infrastructure (OCI).Key Responsibilities De...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineering Manager

    Site Reliability Engineering Manager

    VirtualVocations • Concord, California, United States
    Full-time
    A company is looking for a Manager, Site Reliability Engineer.Key Responsibilities Ensure systems and services maintain high availability, reliability, and scalability Develop and maintain autom...Show more
    Last updated: 30+ days ago • Promoted