Talent.com
Infrastructure Engineer — Systems & Platform

Infrastructure Engineer — Systems & Platform

SixtyfourSan Francisco, CA, United States
4 days ago
Job type
  • Full-time
Job description

What You’ll Do

  • Design and maintain highly available, scalable infrastructure across AWS (ECS, EKS, Lambda, SQS, CloudFront, CloudWatch).
  • Architect automated CI / CD pipelines (GitHub Actions, Terraform) with strong testing, observability, and rollback safety.
  • Optimize LLM inference infrastructure, including autoscaling GPU / CPU clusters, caching, async queues, batching, and tracing.
  • Improve deployment workflows and environment consistency using Docker, IaC, and lightweight configuration management.
  • Work on backend performance, including queue throughput, caching strategies, database indexing, and load balancing.
  • Monitor, debug, and improve system reliability and latency across all services (API, inference, and web app).
  • Build internal tools that enhance developer productivity and operational visibility.
  • Partner with engineers to evolve the workflow and job execution engine for better parallelism, retry logic, and observability.
  • Set up metrics, tracing, and alerting (OpenTelemetry, Prometheus, Grafana, Sentry) to make reliability measurable and actionable.

Minimum Requirements

  • Strong experience with cloud infrastructure (AWS preferred) including EC2, ECS, EKS, Lambda, S3, VPCs, networking, and IAM.
  • Proficiency with Docker and CI / CD tools such as GitHub Actions or CircleCI.
  • Experience scaling Python backend systems and modern web APIs (FastAPI preferred).
  • Hands-on experience with API servers and background workers (Celery, Redis queues, etc.).
  • Comfort with Postgres and Redis, including schema design, caching, rate limiting, and locks.
  • Strong observability mindset, including logs, metrics, and traces.
  • Production experience with autoscaling, load testing, and cost‑aware resource optimization.
  • Excellent debugging and on‑call discipline with a focus on uptime and reliability.
  • Nice to have

  • Experience managing LLM serving infrastructure (OpenAI‑compatible APIs, vLLM, Triton, or similar).
  • Familiarity with Next.js and TypeScript to understand end‑to‑end deployment pipelines.
  • Experience with Terraform, Pulumi, or similar IaC tools.
  • Security‑focused mindset, including network boundaries, secret management, and RBAC.
  • Knowledge of real‑time systems (SSE or WebSockets) or stream processing.
  • Experience building developer platform tools or internal DevOps systems.
  • #J-18808-Ljbffr

    Create a job alert for this search

    Engineer Platform • San Francisco, CA, United States

    Related jobs
    • Promoted
    Principal DevOps Engineer

    Principal DevOps Engineer

    Informatica LLCRedwood City, CA, United States
    Full-time
    Build Your Career at Informatica.We seek innovative thinkers who believe in the power of data to drive meaningful change. At Informatica, we welcome adventurous minds eager to solve the world's most...Show moreLast updated: 25 days ago
    • Promoted
    Systems Engineer, Infrastructure

    Systems Engineer, Infrastructure

    hud (YC W25)San Francisco, CA, United States
    Full-time +1
    HUD (YC W25) is developing agentic evals for Computer Use Agents (CUAs) that browse the web.Our CUA Evals framework is the first comprehensive evaluation tool for CUAs. HUD (YC W25) is backed by Y C...Show moreLast updated: 30+ days ago
    • Promoted
    Platform Engineer : Core Infrastructure

    Platform Engineer : Core Infrastructure

    Slash FinancialSan Francisco, CA, United States
    Full-time
    Slash is building the future of business banking, one industry at a time.We believe businesses deserve financial infrastructure tailored to how they actually operate. That's why we're creating a new...Show moreLast updated: 26 days ago
    • Promoted
    Infrastructure Engineer

    Infrastructure Engineer

    Mercor, Inc.San Francisco, CA, United States
    Full-time
    We use our platform to source, vet, and onboard expert contractors who help train AI models in a wide variety of domains. Our technology is so effective it’s used by all of the top 5 AI labs.We scal...Show moreLast updated: 6 days ago
    • Promoted
    Infrastructure Engineer

    Infrastructure Engineer

    ChalkSan Francisco, CA, United States
    Full-time
    Chalk is building the data platform that powers the future of machine learning applications.We tear down complexity, latency, and scale barriers that have traditionally constrained ML capabilities....Show moreLast updated: 30+ days ago
    • Promoted
    Platform & Infrastructure Engineer

    Platform & Infrastructure Engineer

    MindsDBSan Francisco, CA, United States
    Full-time
    Retrieved from the description.MindsDB is a fast-growing AI startup headquartered in San Francisco, California.MindsDB is an AI Analytics solution that connects to diverse data sources and applicat...Show moreLast updated: 6 days ago
    • Promoted
    Infrastructure Deployment Engineer

    Infrastructure Deployment Engineer

    Cloudflare, Inc.San Francisco, CA, United States
    Full-time
    At Cloudflare, we are on a mission to help build a better Internet.Today the company runs one of the world's largest networks that powers millions of websites and other Internet properties for cust...Show moreLast updated: 6 days ago
    • Promoted
    Infrastructure and Platform Engineer

    Infrastructure and Platform Engineer

    DevOps projectsSan Francisco, CA, United States
    Full-time
    Infrastructure and Platform Engineer.Vultron is bringing general intelligence to government contracting.As an early member of the team, you’ll be part of a transformative company from its early sta...Show moreLast updated: 2 days ago
    • Promoted
    Lead Platform Engineer (Network Infrastructure)

    Lead Platform Engineer (Network Infrastructure)

    Capital OneSan Francisco, CA, United States
    Full-time
    Lead Platform Engineer (Network Infrastructure).Do you love building and pioneering in the technology space? Do you enjoy solving complex technical problems in a fast-paced, collaborative, inclusiv...Show moreLast updated: 30+ days ago
    • Promoted
    Infrastructure Engineer

    Infrastructure Engineer

    WorkilaSan Francisco, CA, United States
    Full-time
    The sheer scale of our capabilities and client engagements and the way we collaborate, operate and deliver value provides an unparalleled opportunity to grow and advance. Choose Workila, and make de...Show moreLast updated: 30+ days ago
    • Promoted
    Software Infrastructure & Platform Engineer

    Software Infrastructure & Platform Engineer

    PsiQuantumPalo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show moreLast updated: 30+ days ago
    • Promoted
    Cloud Infrastructure Engineer

    Cloud Infrastructure Engineer

    Brain Trust IncSan Francisco, CA, United States
    Full-time
    Braintrust is the AI observability platform.By connecting evals and observability in one workflow, Braintrust gives builders the visibility to understand how AI behaves in production and the tools ...Show moreLast updated: 9 days ago
    • Promoted
    Infrastructure Engineer

    Infrastructure Engineer

    LangchainSan Francisco, CA, United States
    Full-time
    At LangChain, our mission is to make intelligent agents ubiquitous.We provide the agent engineering platform and open source frameworks developers need to ship reliable agents fast.Our open source ...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Founding Infrastructure / Platform Engineer

    Founding Infrastructure / Platform Engineer

    Key TechnologySan Francisco, CA, United States
    Full-time
    Direct message the job poster from Key Technology.Global Talent Acquisition Partner | Scaling High-Growth Tech Startups | Placing Blockchain, AI & Machine Learning Superstars in NY.We’re hiring for...Show moreLast updated: 16 hours ago
    • Promoted
    Senior Systems Engineer, Infrastructure & Platform Reliability

    Senior Systems Engineer, Infrastructure & Platform Reliability

    Lambda Inc.San Francisco, CA, United States
    Full-time
    Lambda, The Superintelligence Cloud, builds Gigawatt-scale AI Factories for Training and Inference.Lambda’s mission is to make compute as ubiquitous as electricity and give every person access to a...Show moreLast updated: 9 days ago
    • Promoted
    Senior Infrastructure Engineer - Scale Platform (Remote)

    Senior Infrastructure Engineer - Scale Platform (Remote)

    ClassDojoSan Francisco, CA, United States
    Remote
    Full-time
    An innovative educational technology company is seeking a senior software engineer to enhance their platform infrastructure. The ideal candidate has extensive experience in software development and ...Show moreLast updated: 23 hours ago
    • Promoted
    Software Engineer - Infrastructure

    Software Engineer - Infrastructure

    Intellipro GroupPalo Alto, California, United States
    Full-time +1
    Software Engineer - Infrastructure .LiveX AI is on a mission to transform how companies engage with customers—before, during, and after the sale—through the power of AI. Backend Engineer - Infrastru...Show moreLast updated: 30+ days ago
    • Promoted
    Systems Engineer, Infrastructure

    Systems Engineer, Infrastructure

    HUDSan Francisco, CA, United States
    Full-time
    Computer Use Agents (CUAs) that browse the web.People don't actually know if AI agents are working.To make AI agents work in the real world, we need detailed evals for a huge range of tasks.We're b...Show moreLast updated: 30+ days ago