Overview
AI Performance Engineer (Cloud AI Engineering)
Qualcomm is utilizing its traditional strengths in digital wireless technologies to play a central role in the evolution of Cloud AI. We are investing in several supporting technologies including Deep Learning. The Qualcomm Cloud AI team is developing hardware and software solutions for Inference Acceleration. We are hiring AI Performance Engineers at multiple levels to join our dynamic, collaborative team. This role spans the full product lifecycle—from cutting-edge research and development to commercial deployment—and demands strategic thinking, strong execution, and excellent communication skills.
What You Will Do
- Convert, optimize and deploy models for efficient inference using PyTorch, ONNX.
- Work at the forefront of GenAI by understanding advanced algorithms (e.g. attention mechanisms, MoEs) and numerics to identify new optimization opportunities.
- Performance analysis and optimization of LLM, VLM, and diffusion models for inference. Scale performance for throughput and latency constraints.
- Map the next generation AI workloads on top of current and future hardware designs.
- Collaborate with customers and internal teams (compiler, firmware and platform) to drive solutions.
- Analyze complex performance or stability issues to identify root causes.
- Create solutions to deliver continuous insights into AI workload performance and guide improvements over time.
- Design and implement high-level kernels, e.g. in Triton, with a focus on generating efficient, low-level code.
Qualifications
Hands-on experience in building and optimizing language models, notably in PyTorch, ONNX, preferably in production-grade environments.Deep understanding of transformer architectures, attention mechanisms and performance trade-offs.Experience in workload mapping strategies exhibiting sharding or various parallelisms.Strong Python programming skills.Proactive learning about the latest inference optimization techniques.Understanding of computer architecture, ML accelerators, in-memory processing and distributed systems.Strong communication, problem-solving skills and ability to learn and work effectively in a fast-paced and collaborative environment.MS in Computer Science, Machine Learning, Computer Engineering or Electrical Engineering.Bonus Skills
Background in neural network operators and mathematical operations, including linear algebra and math libraries.Understanding of machine learning compilers.Experience in converging accuracy and its evaluation methods.Knowledge of torch.compile or torchDynamo.PhD in Computer Science, Computer Engineering or Machine Learning.Minimum Qualifications
Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 6+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.OR Master's degree in Computer Science, Engineering, Information Systems, or related field and 5+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.OR PhD in Computer Science, Engineering, Information Systems, or related field and 4+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.Equal Opportunity
Qualcomm is an equal opportunity employer. If you are an individual with a disability and need an accommodation during the application / hiring process, Qualcomm provides an accessible process. You may email disability-accommodations@qualcomm.com or call Qualcomm\'s toll-free number. Qualcomm is committed to making our workplace accessible for individuals with disabilities.
Pay Range and Benefits
$178,400.00 - $267,600.00
The pay range reflects the broad minimum to maximum for this job code and location. Salary is only one component of total compensation; Qualcomm offers a competitive annual discretionary bonus program and RSU grants, plus a comprehensive benefits package.
For more information about this role, please contact Qualcomm Careers.
J-18808-Ljbffr