Talent.com
No longer accepting applications
Member of Technical Staff - ML Infrastructure Engineer

Member of Technical Staff - ML Infrastructure Engineer

Black Forest Labs Inc.San Francisco, CA, United States
1 day ago
Job type
  • Full-time
Job description

Overview

Black Forest Labs is a cutting-edge startup pioneering generative image and video models. Our team, which invented Stable Diffusion, Stable Video Diffusion, and FLUX.1, is seeking a strong candidate to join us in developing and maintaining our ML infrastructure, including large GPU training and inference clusters.

Responsibilities

  • Design, deploy, and maintain cloud-based ML training (Slurm) and inference (Kubernetes) clusters
  • Implement and manage network-based cloud file systems and blob / S3 storage solutions
  • Develop and maintain Infrastructure as Code (IaC) for resource provisioning
  • Implement and optimize CI / CD pipelines for ML workflows
  • Design and implement custom autoscaling solutions for ML workloads
  • Ensure security best practices across the ML infrastructure
  • Provide developer-friendly tools and practices for efficient ML operations

Ideal Experience

  • Strong proficiency in cloud platforms (AWS, Azure, or GCP) with focus on ML / AI services
  • Extensive experience with Kubernetes and Slurm cluster management
  • Expertise in Infrastructure as Code tools (e.g., Terraform, Ansible)
  • Proven track record in managing and optimizing network-based cloud file systems and object storage
  • Experience with CI / CD tools and practices (e.g., CircleCI, GitHub Actions, ArgoCD)
  • Strong understanding of security principles and best practices in cloud environments
  • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Loki)
  • Familiarity with ML workflows and GPU infrastructure management
  • Demonstrated ability to handle complex migrations and breaking changes in production environments
  • Nice to have

  • Experience with custom autoscaling solutions for ML workloads
  • Knowledge of cost optimization strategies for cloud-based ML infrastructure
  • Familiarity with MLOps practices and tools
  • Experience with high-performance computing (HPC) environments
  • Understanding of data versioning and experiment tracking for ML
  • Knowledge of network optimization for distributed ML training
  • Experience with multi-cloud or hybrid cloud architectures
  • Familiarity with container security and vulnerability scanning tools
  • EEO and Privacy

    Black Forest Labs is an equal opportunity employer. We do not discriminate on the basis of any protected status under applicable law. Employment is contingent on compliance with applicable laws and regulations. Voluntary self-identification of disability information is requested for government reporting purposes; participation is voluntary and will not affect hiring decisions. Any information provided is confidential.

    #J-18808-Ljbffr

    Create a job alert for this search

    Staff Infrastructure • San Francisco, CA, United States

    Related jobs
    • Promoted
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    PicarroSanta Clara, CA, United States
    Full-time
    Santa Clara, CA, is a leading technology company specializing in high-precision gas analyzers and optical spectroscopy instruments, built on Cavity Ring-Down Spectroscopy (CRDS) for ultra-sensitive...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    Ariat InternationalSan Leandro, CA, US
    Full-time
    We are looking for a seasoned Senior Infrastructure Engineer to join our IT team and contribute to the design, deployment, and management of enterprise infrastructure systems.This role is critical ...Show moreLast updated: 23 hours ago
    • Promoted
    • New!
    AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - ML Compute

    AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - ML Compute

    Apple Inc.San Francisco, CA, United States
    Full-time
    AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - ML Compute.San Francisco Bay Area, California, United States Machine Learning and AI. Apple is where individual imaginations gathe...Show moreLast updated: 3 hours ago
    • Promoted
    Staff Infrastructure Engineer

    Staff Infrastructure Engineer

    ScribdSan Francisco, CA, United States
    Full-time
    At Scribd (pronounced “scribbed”), our mission is to spark human curiosity.Join our team as we create a world of stories and knowledge, democratize the exchange of ideas and information, and empowe...Show moreLast updated: 6 days ago
    • Promoted
    Technical Lead, ML Training Infrastructure

    Technical Lead, ML Training Infrastructure

    NuroMountain View, CA, United States
    Full-time
    Nuro is a self-driving technology company on a mission to make autonomy accessible to all.Founded in 2016, Nuro is building the world's most scalable driver, combining cutting-edge AI with automoti...Show moreLast updated: 3 days ago
    • Promoted
    Founding Machine Learning Infrastructure Engineer

    Founding Machine Learning Infrastructure Engineer

    NomadicML Inc.San Francisco, CA, United States
    Full-time
    Harvard, where they both did research in the intersection of computation and evaluations.Between them, they have authored multiple published papers in the machine learning domain and hold numerous ...Show moreLast updated: 30+ days ago
    • Promoted
    AI Infrastructure Engineer, Model Serving Platform

    AI Infrastructure Engineer, Model Serving Platform

    Scale AI, Inc.San Francisco, CA, United States
    Full-time
    As a Software Engineer on the ML Infrastructure team, you will design and build platforms for scalable, reliable, and efficient serving of LLMs. Our platform powers cutting-edge research and product...Show moreLast updated: 30+ days ago
    • Promoted
    Hardcore Engineer - Infrastructure / Supercomputing

    Hardcore Engineer - Infrastructure / Supercomputing

    xAIPalo Alto, CA, US
    Full-time
    AI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Staff Telecom Engineer

    Senior Staff Telecom Engineer

    EquinixSan Jose, CA, United States
    Full-time
    Equinix is the world's digital infrastructure company, shortening the path to connectivity to enable the innovations that enrich our work, life and planet. A place where bold ideas are welcomed, hum...Show moreLast updated: 3 days ago
    • Promoted
    Staff MLE

    Staff MLE

    HarnhamHayward, CA, US
    Full-time
    Staff Machine Learning Engineer.A leading commerce marketplace with 130M+ users and billions of daily events is hiring a. Staff Machine Learning Engineer.Their marketplace connects buyers and seller...Show moreLast updated: 23 hours ago
    • Promoted
    Infrastructure Engineers wanted

    Infrastructure Engineers wanted

    RustsyndiSan Francisco, CA, United States
    Full-time
    Infrastructure Engineers wanted at EdgeDB.Join EdgeDB, an open-source database built on top of Postgres, and help scale out our cloud infrastructure. As an SRE / Infrastructure Engineer at EdgeDB, you...Show moreLast updated: 30+ days ago
    • Promoted
    Principal Staff Software Engineer, Capacity Infrastructure

    Principal Staff Software Engineer, Capacity Infrastructure

    LinkedInMountain View, CA, United States
    Full-time
    Our vision is to create economic opportunity for every member of the global workforce.Every day our members use our products to make connections, discover opportunities, build skills and gain insig...Show moreLast updated: 2 days ago
    • Promoted
    Senior Staff Infrastructure Security Engineer

    Senior Staff Infrastructure Security Engineer

    Promote ProjectSan Francisco, CA, United States
    Full-time
    Senior Staff Infrastructure Security Engineer.Crusoe is building the World’s Favorite AI-first Cloud infrastructure company. We’re pioneering vertically integrated, purpose-built AI infrastructure s...Show moreLast updated: 30+ days ago
    • Promoted
    Infrastructure Engineer

    Infrastructure Engineer

    Mercor, Inc.San Francisco, CA, United States
    Full-time
    We use our platform to source, vet, and onboard expert contractors who help train AI models in a wide variety of domains. Our technology is so effective it’s used by all of the top 5 AI labs.We scal...Show moreLast updated: 11 days ago
    • Promoted
    Infrastructure Engineer - Developer Productivity

    Infrastructure Engineer - Developer Productivity

    Recruiting From ScratchSan Francisco, CA, United States
    Full-time
    Who is Recruiting from Scratch : .Recruiting from Scratch is a talent firm that focuses on placing the best candidate for our clients. Our team is 100% remote and we work with teams across North Ameri...Show moreLast updated: 9 days ago
    Infrastructure Engineer

    Infrastructure Engineer

    Lever Demo - IS OpportunitiesSan Francisco, California, United States, 94102
    Full-time
    PLEASE READ : these jobs are testing jobs of Lever's testing environment - please do not apply for this job.Lever was founded ten years ago to tackle the most strategic challenge that companies face...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    Macroscope Inc.San Francisco, CA, United States
    Full-time
    Macroscope aims to be the source of truth of what's happening for any company that builds software.Our mission is to give leaders clarity and engineers time. We help leaders understand how their pro...Show moreLast updated: 12 days ago
    • Promoted
    Sr. ML Infrastructure Engineer, Apple Data Platform

    Sr. ML Infrastructure Engineer, Apple Data Platform

    AppleCupertino, CA, United States
    Full-time
    ML Infrastructure Engineer, Apple Data Platform.Cupertino, California, United States.The Apple Data Platform (ADP) group builds the data platform that enables the next generation of intelligent exp...Show moreLast updated: 1 day ago
    • Promoted
    AI Infrastructure Engineer, ML Data Platform

    AI Infrastructure Engineer, ML Data Platform

    Scale AISan Francisco, CA, United States
    Full-time
    Scale's AI Infrastructure team supports both R&D and applied Generative AI initiatives, driving breakthroughs in areas of post-training research such as AI safety, agents, and evaluating state-of-t...Show moreLast updated: 3 days ago
    • Promoted
    • New!
    Principal Staff Software Engineer, Capacity Infrastructure

    Principal Staff Software Engineer, Capacity Infrastructure

    Collide Capital LLCMountain View, CA, United States
    Full-time
    Our products help people make powerful connections, discover exciting opportunities, build necessary skills, and gain valuable insights every day. We're also committed to providing transformational ...Show moreLast updated: 8 hours ago