What to Expect As a member of the AIHW Infra team, you will play a critical role in supporting Tesla's AI hardware initiatives by developing automation, infrastructure, and services. Join a dynamic team of engineers dedicated to accelerating workloads through collaboration with AI HW design teams and High-Performance Computing (HPC) groups. Your primary focus will be building robust infrastructure solutions, while also assisting in debugging performance bottlenecks and root-causing cluster issues as needed. The ideal candidate is a proactive engineer with a passion for creating scalable, efficient systems.
What You'll Do
Developing Python libraries to automate, monitor, measure, and troubleshoot workflows on AI hardware infrastructure
Spending approximately 50% of time building automation and infrastructure, primarily in Python and other languages, with the remaining time focused on debugging, experimentation, and resolving infrastructure challenges and performance bottlenecks
Creating and maintaining tools for infrastructure, automation, observability, and reporting to ensure system reliability and performance
Collaborating with AI HW design and HPC teams to identify, debug, and resolve performance bottlenecks and cluster issues
Supporting internal users by triaging errors, root-causing issues, and providing effective, maintainable solutions
Proactively addressing potential infrastructure challenges to minimize user impact and enhance scalability and efficiency of AI hardware workloads
What You'll Bring
Degree in Engineering, Computer Science, or equivalent experience with evidence of exceptional ability and practical software engineering expertise
Strong proficiency in Python and adaptability to learn new languages and frameworks
Extensive familiarity with Linux administration and internals
Experience or strong interest in automation, observability, and infrastructure development and deployment
Ability to collaborate effectively with cross-functional teams to debug and optimize complex systems
Compensation and Benefits Benefits
Along with competitive pay, as a full-time Tesla employee, you are eligible for the following benefits at day 1 of hire :
2 medical plan options with $0 payroll deduction
Expected Compensation $132,000 - $300,000 / annual salary + cash and stock awards + benefits
Pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. The total compensation package for this position may also include other elements dependent on the position offered. Details of participation in these benefit plans will be provided if an employee receives an offer of employment.
Software Engineer Infrastructure • Palo Alto, CA, United States