The Role :
We are seeking a Principal AI Engineer to lead the design and advancement of our AI platform. You will play a key role in shaping the infrastructure that powers large-scale training and cloud inference. This includes accelerating training throughput, scaling multi-modal models, and enabling the next generation of AI-driven driving systems. We're tackling challenges across distributed training, training efficiency, DDP / FSDP, data processing pipelines, and Pytorch model optimization. This is a highly impactful position where your technical leadership will define how we scale AI to achieve autonomy.
What You’ll Do :
- Architect, build, and optimize core AI / ML platform infrastructure to support massive-scale model training.
- Collaborate with data scientists, ML engineers, and software developers to enable seamless workflows from research to production.
- Drive efficiency in large-scale distributed training and data processing pipelines.
- Establish best practices for reliability, scalability, and performance across the AI / ML platform.
- Provide technical leadership and mentorship, guiding teams on platform design, architecture decisions, and emerging technologies.
- Partner with cross-functional stakeholders to align platform capabilities with business needs and strategic AI initiatives.
Your Skills & Abilities (Required Qualifications) :
Bachelor’s degree or higher in Computer Science, related field, or equivalent experience.8+ years of professional software engineering experience.4+ years of specialized experience in AI / ML domain (e.g., enabling distributed training for large-scale models).Strong programming skills in Python, with proficiency in frameworks such as PyTorch (preferred) or TensorFlow.Experience with distributed systems, GPU computing, and cloud environments (AWS, GCP, or Azure).Comfortable operating in highly ambiguous and dynamic environments.Willingness to travel to Sunnyvale, CA as needed.What Will Give You a Competitive Edge (Preferred Qualifications) :
Proven track record of self-motivation, execution, and delivering impact.Deep expertise with PyTorch 2.x+ and distributed training frameworks.Strong skills in profiling, analysis, debugging, and optimizing training performance (e.g., avoiding memory fragmentation, operation fusion).Proficiency in C++ for performance-critical components.Experience leading cross-functional projects and aligning diverse stakeholders on priorities.Compensation : The compensation information is a good faith estimate only. It is based on what a successful applicant might be paid in accordance with applicable state laws. The compensation may not be representative for positions located outside of New York, Colorado, California, or Washington.
The salary range for this role is $197,600 to $374,200. The actual base salary a successful candidate will be offered within this range will vary based on factors relevant to the position.Bonus Potential : An incentive pay program offers payouts based on company performance, job level, and individual performance.Benefits : GM offers a variety of health and wellbeing benefit programs. Benefit options include medical, dental, vision, Health Savings Account, Flexible Spending Accounts, retirement savings plan, sickness and accident benefits, life insurance, paid vacation & holidays, tuition assistance programs, employee assistance program, GM vehicle discounts and more.Work Location : This role is based remotely but if you live within a 50-mile radius of Atlanta, Austin, Detroit, Warren, Milford or Mountain View, you are expected to report to that location three times a week, at minimum.
#J-18808-Ljbffr