Talent.com
Software Engineer, Training Performance, AI Infrastructure
Software Engineer, Training Performance, AI InfrastructureTesla • Palo Alto, CA, United States
Software Engineer, Training Performance, AI Infrastructure

Software Engineer, Training Performance, AI Infrastructure

Tesla • Palo Alto, CA, United States
30+ days ago
Job type
  • Full-time
Job description

What to Expect

As a Software Engineer within the Autopilot AI Infrastructure team, you will work on reinforcing, optimizing, and scaling our infrastructure components supporting AI research activities for Autopilot and the Optimus.

At the core of our autonomy capabilities are neural networks that the research team is designing to train on very large amounts of data, across large-scale GPU clusters. Robustly training these models at scale and in the shortest amount of time is critical to our mission.

What You'll Do

  • Reduce wall clock time to convergence of our training jobs by identifying bottlenecks in the ML stack, from data-loading up to the GPU
  • Integrate efficient, low-level code with the overall high-level training framework
  • Profile our workloads and implement solutions to increase training efficiency
  • Optimize workloads for efficient hardware utilization (e.g. CPU and GPU compute, data throughput, networking)

What You'll Bring

  • Members of the Autopilot AI Infrastructure team are expected to be adaptable to the dynamic requirements of AI research and capable of contributing across all parts of the AI training software stack
  • Practical experience programming in Python and / or C / C++
  • Experience programming in CUDA, cuDNN or Triton, particularly in the context of operations used in AI workloads
  • Experience profiling and optimizing CPU-GPU interactions (pipelining computation with data transfers, etc.)
  • Experience working with training frameworks (ideally PyTorch)
  • Proficient in system-level software, in particular hardware-software interactions and resource utilization
  • Experience with parallel programming concepts and primitives
  • Understanding of modern machine learning concepts and state of the art deep learning
  • Experience scaling neural network training jobs across many GPUs
  • Compensation and Benefits Benefits

    Along with competitive pay, as a full-time Tesla employee, you are eligible for the following benefits at day 1 of hire :

  • Aetna PPO and HSA plans >
  • 2 medical plan options with $0 payroll deduction

  • Family-building, fertility, adoption and surrogacy benefits
  • Dental (including orthodontic coverage) and vision plans, both have options with a $0 paycheck contribution
  • Company Paid (Health Savings Account) HSA Contribution when enrolled in the High Deductible Aetna medical plan with HSA
  • Healthcare and Dependent Care Flexible Spending Accounts (FSA)
  • 401(k) with employer match, Employee Stock Purchase Plans, and other financial benefits
  • Company paid Basic Life, AD&D, short-term and long-term disability insurance
  • Employee Assistance Program
  • Sick and Vacation time (Flex time for salary positions), and Paid Holidays
  • Back-up childcare and parenting support resources
  • Voluntary benefits to include : critical illness, hospital indemnity, accident insurance, theft & legal services, and pet insurance
  • Weight Loss and Tobacco Cessation Programs
  • Tesla Babies program
  • Commuter benefits
  • Employee discounts and perks program
  • Expected Compensation $118,000 - $390,000 / annual salary + cash and stock awards + benefits

    Pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. The total compensation package for this position may also include other elements dependent on the position offered. Details of participation in these benefit plans will be provided if an employee receives an offer of employment.

    Create a job alert for this search

    Software Engineer Infrastructure • Palo Alto, CA, United States

    Related jobs
    Principal Software Engineer AI Platform

    Principal Software Engineer AI Platform

    Snorkel Ai • Redwood City, California, United States
    Full-time
    At Snorkel, we believe meaningful AI doesn’t start with the model, it starts with the data.We’re on a mission to help enterprises transform expert knowledge into specialized AI at scale.The AI land...Show more
    Last updated: 3 days ago • Promoted
    Software Engineer - Analytics & AI

    Software Engineer - Analytics & AI

    Cxapp Us, Inc. • San Ramon, California, United States
    Full-time
    At CXApp, we are the innovators of Indoor Intelligence, delivering actionable insights for people, places and things.Our flagship product the “CXApp” is a workplace experience platform for the ente...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer, Training & Inference Infrastructure

    Software Engineer, Training & Inference Infrastructure

    DatologyAI • Redwood City, CA, United States
    Full-time
    But a large portion of training compute is wasted training on data that are already learned, irrelevant, or even harmful, leading to worse models that cost more to train and deploy.At DatologyAI, w...Show more
    Last updated: 30+ days ago • Promoted
    AI Engineer Reinforcement Learning

    AI Engineer Reinforcement Learning

    Yutori • San Francisco, California, United States
    Full-time
    Yutori is reimagining how people interact with the web by building AI agents that can reliably do everyday digital tasks. We are building the entire stack to be agent-first, from training our own mo...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer - AI Agent Infrastructure (Healthcare)

    Software Engineer - AI Agent Infrastructure (Healthcare)

    Honey Health • Hayward, CA, United States
    Full-time
    Honey Health is the all-in-one AI back office for primary and specialty care.Our AI agents autonomously handle core back-office jobs, such as aggregating patient data, processing orders and prescri...Show more
    Last updated: 13 days ago • Promoted
    Senior Software Engineer, Enterprise AI - ML Training

    Senior Software Engineer, Enterprise AI - ML Training

    Woven by Toyota • Stanford, CA, United States
    Full-time
    Senior Software Engineer, Enterprise AI - ML Training.Woven by Toyota is enabling Toyota's once-in-a-century transformation into a mobility company. Our mission is to challenge the current state of ...Show more
    Last updated: 13 hours ago • Promoted • New!
    Software Engineer, Distributed Training, AI Infrastructure

    Software Engineer, Distributed Training, AI Infrastructure

    Tesla • Palo Alto, CA, United States
    Full-time
    As a Software Engineer within the Autopilot AI Infrastructure team, you will work on reinforcing, optimizing, and scaling our infrastructure components supporting AI research activities for Autopil...Show more
    Last updated: 30+ days ago • Promoted
    AI Infrastructure Engineer, Model Serving Platform

    AI Infrastructure Engineer, Model Serving Platform

    Scale AI, Inc. • San Francisco, CA, United States
    Full-time
    As a Software Engineer on the ML Infrastructure team, you will design and build platforms for scalable, reliable, and efficient serving of LLMs. Our platform powers cutting-edge research and product...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer, AI Training and Infrastructure

    Software Engineer, AI Training and Infrastructure

    Skild.ai • San Francisco, CA, United States
    Full-time
    Software Engineer, AI Training and Infrastructure.At Skild AI, we are building the world's first general purpose robotic intelligence that is robust and adapts to unseen scenarios without failing.W...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer- AI / ML, AWS Neuron Distributed Training

    Software Engineer- AI / ML, AWS Neuron Distributed Training

    Amazon • Cupertino, CA, United States
    Full-time
    Annapurna Labs designs silicon and software that accelerates innovation.Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago-even yesterday.Ou...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer - AI Training (Freelance, Remote)

    Software Engineer - AI Training (Freelance, Remote)

    Alignerr • San Francisco, CA, United States
    Remote
    Full-time
    Software Engineer - AI Training (Freelance, Remote).Alignerr is a community of subject matter experts from several disciplines who align AI models by creating high-quality data in their field of ex...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Software Engineer- AI / ML, AWS Neuron Distributed Training

    Sr. Software Engineer- AI / ML, AWS Neuron Distributed Training

    Amazon • Cupertino, CA, United States
    Full-time
    Annapurna Labs designs silicon and software that accelerates innovation.Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago-even yesterday.Ou...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer, Machine Learning Infrastructure

    Software Engineer, Machine Learning Infrastructure

    Datologyai • Redwood City, California, United States
    Full-time
    Companies want to train their own large models on their own data.The current industry standard is to train on a random sample of your data, which is inefficient at best and actively harmful to mode...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Staff Software Engineer, AI Infra

    Sr. Staff Software Engineer, AI Infra

    Linkedin • Mountain View, California, United States
    Full-time
    LinkedIn is the worlds largest professional network, built to create economic opportunity for every member of the global workforce. Our products help people make powerful connections, discover excit...Show more
    Last updated: 30+ days ago • Promoted
    Forward Deployed AI Engineer

    Forward Deployed AI Engineer

    Datologyai • Redwood City, California, United States
    Full-time
    But a large portion of training compute is wasted training on data that are already learned, irrelevant, or even harmful, leading to worse models that cost more to train and deploy.At DatologyAI, w...Show more
    Last updated: 1 day ago • Promoted
    Software Engineer, C - AI Training (Freelance, Remote)

    Software Engineer, C - AI Training (Freelance, Remote)

    Alignerr • San Francisco, CA, United States
    Remote
    Full-time
    Software Engineer, C - AI Training (Freelance, Remote).Software Engineer, C - AI Training (Freelance, Remote).AI models by creating high-quality data to build the future of Generative AI.Operated b...Show more
    Last updated: 30+ days ago • Promoted
    Machine Learning Engineer - Post Training

    Machine Learning Engineer - Post Training

    EPM Scientific • San Francisco County, CA, United States
    Full-time
    Machine Learning Engineer - Post Training.A stealth-stage venture backed by Lux Capital (investors in DeepMind and OpenAI) is developing frontier-scale AI systems for high-impact applications in hu...Show more
    Last updated: 20 days ago • Promoted
    Senior Software Engineer - AI Agent Infrastructure (Healthcare)

    Senior Software Engineer - AI Agent Infrastructure (Healthcare)

    Honey Health • Hayward, CA, United States
    Full-time
    Honey Health is the all-in-one AI back office for primary and specialty care.Our AI agents autonomously handle core back-office jobs, such as aggregating patients data, processing orders and prescr...Show more
    Last updated: 11 days ago • Promoted