Senior On-Device Model Inference Optimization Engineer

NVIDIASanta Clara, CA, United States

3 days ago

Job type

Full-time

Job description

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.

We are seeking a highly-skilled Senior On-Device Model Inference Optimization Engineer to join our team and lead efforts in improving the performance and efficiency of AI models enabling the next generation of autonomous vehicles technology at NVIDIA!

What you'll be doing :

Develop and implement strategies to optimize AI model inference for on-device deployment.

Employ techniques like pruning, quantization, and knowledge distillation to minimize model size and computational demands.

Optimize performance-critical components using CUDA and C++.

Collaborate with multi-functional teams to align optimization efforts with hardware capabilities and deployment needs.

Benchmark inference performance, identify bottlenecks, and implement solutions.

Research and apply innovative methods for inference optimization.

Adapt models for diverse hardware platforms and operating systems with varying capabilities.

Create tools to validate the accuracy and latency of deployed models at scale with minimal friction.

Recommend and implement model architecture changes to improve the accuracy-latency balance.

What we need to see :

MSc or PhD in Computer Science, Engineering, or a related field, or equivalent experience.

Over 10 years of confirmed experience specializing in model inference and optimization.

Expertise in modern machine learning frameworks, particularly PyTorch, ONNX, and TensorRT.

Proven experience in optimizing inference for transformer and convolutional architectures.

Strong programming proficiency in CUDA, Python, and C++.

In-depth knowledge of optimization techniques, including quantization, pruning, distillation, and hardware-aware neural architecture search.

Skilled in building and deploying scalable, cloud-based inference systems.

Passionate about developing efficient, production-ready solutions with a strong focus on code quality and performance.

Meticulous attention to detail, ensuring precision and reliability in safety-critical systems.

Strong collaboration and communication skills for working optimally across multidisciplinary teams.

Ways to stand out from the crowd :

Publications or industry experience in optimizing and deploying model inference at scale.

Hands-on expertise in hardware-aware optimizations and accelerators such as GPUs, TPUs, or custom ASICs.

Active contributions to open-source projects focused on inference optimization or machine learning frameworks.

Experience in designing and deploying inference pipelines for real-time or autonomous systems.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

You will also be eligible for equity and benefits () .

Applications for this job will be accepted at least until October 10, 2025.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Create a job alert for this search

Senior Engineer Model • Santa Clara, CA, United States

Related jobs

Promoted

Senior Research Engineer, TikTok AI Search (LLM Pretraining / Alignment / Inference)

Tik TokSan Jose, CA, United States

Full-time

About the team On the TikTok Search Team, you will have the opportunity to develop and apply cutting edge machine learning technologies in real-time large-scale systems, which serve billions of sea...Show moreLast updated: 3 days ago

Promoted

Sr. Engineer, Optical Systems Modeling

Ayar LabsSan Jose, CA, United States

Full-time

Location : San Jose, CA (on site).Ayar Labs is pioneering the next generation of optical interconnect technology.We’re seeking a highly motivated Modeling Engineer to join our System Modeling Team a...Show moreLast updated: 10 days ago

Promoted

Senior Verification engineer

TWO95 InternationalSunnyvale, CA, United States

Full-time

Title : Lead / Senior Verification engineer.Location : San Jose, CA / Santa Clara, CA.Skills : UVM and System Verilog.Strong experience in SystemVerilog and UVM verification methodologies.Proficiency ...Show moreLast updated: 30+ days ago

Promoted

Senior Research Engineer - Foundation Models, Ads Integrity

Tik TokSan Jose, CA, United States

Full-time

Our Business Integrity team has a strong user focus and a dedication to technical excellence.We aim to meet our users' needs with reliable and high-performing platforms and services.We are looking ...Show moreLast updated: 30+ days ago

Promoted

Memory Innovation Engineer, AI and Machine Learning

Micron TechnologySan Jose, CA, United States

Full-time

Our vision is to transform how the world uses information to enrich life for all.Micron Technology is a world leader in innovating memory and storage solutions that accelerate the transformation of...Show moreLast updated: 30+ days ago

Promoted

Performance Engineer - Deep Learning

NVIDIASanta Clara, CA, United States

Full-time

NVIDIA is hiring software engineers at all experience levels to build and optimize the tools Deep Learning engineers use across the world to design, develop, and deploy AI applications.This positio...Show moreLast updated: 30+ days ago

Promoted

Senior Research Engineer, Foundation Model Training Infrastructure

NVIDIASanta Clara, CA, United States

Full-time

NVIDIA is searching for a senior or principal engineer who specializes in building cutting-edge infrastructure for large-scale foundation model training in the Generalist Embodied Agent Research (G...Show moreLast updated: 3 days ago

Promoted

Sr. Machine Learning Engineer (Recommendation Systems)

PhiloSan Francisco, CA, United States

Full-time

At Philo, we’re a group of technology and product people who set out to build the future of television, marrying the best in modern technology with the most compelling medium ever invented.We lever...Show moreLast updated: 30+ days ago

Promoted

Lead Engineer, Inference Platform

MongoDBPalo Alto, CA, United States

Full-time

MongoDB's mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data. We enable organizations of all sizes to easily build, scale, and...Show moreLast updated: 3 days ago

Promoted

Senior Engineer, Wireless Research

SonySan Jose, CA, United States

Full-time

Sony Group Corporation, based in Tokyo, Japan.Sony Interactive Entertainment LLC, Sony Music Entertainment, Sony Music Publishing and Sony Pictures Entertainment Inc. With some 900 million Sony devi...Show moreLast updated: 3 days ago

Promoted

Senior Deep Learning Systems Engineer, Datacenters

NVIDIASanta Clara, CA, United States

Full-time

As NVIDIA makes inroads into the Datacenter business, our team plays a central role in getting the most out of our exponentially growing datacenter deployments as well as establishing a data-driven...Show moreLast updated: 3 days ago

Promoted

Senior AI Research Engineer, Model Inference (Remote)

Tether Operations LimitedSan Francisco, CA, United States

Remote

Full-time

Join Tether and Shape the Future of Digital Finance.At Tether, we're not just building products, we're pioneering a global financial revolution. Our cutting-edge solutions empower businesses-from ex...Show moreLast updated: 3 days ago

Promoted

Senior Sensor Engineer

WaabiSan Francisco, CA, United States

Full-time

Waabi, founded by AI pioneer and visionary Raquel Urtasun, is an AI company building the next generation of self-driving technology. With a world class team and an innovative approach that unleashes...Show moreLast updated: 3 days ago

Promoted

Senior Engineer - AI and HPC Observability

NVIDIASanta Clara, CA, United States

Full-time

NVIDIA is a pioneer in accelerated computing, known for inventing the GPU and driving breakthroughs in gaming, computer graphics, high-performance computing, and artificial intelligence.Our technol...Show moreLast updated: 3 days ago

Promoted

Defense Machine Learning Engineer - Remote

iO AssociatesSan Jose, California, United States

Remote

Full-time

Job Title : Uncleared Machine Learning Engineer - Remote.Our Client is at the forefront of Intelligent Exploration and Enterprise AI with their cutting-edge AI Platform. Operating in a dynamic indust...Show moreLast updated: 20 days ago

Promoted

Senior Machine Learning Engineer - Systems

EvenUpSan Francisco, CA, United States

Full-time

EvenUp is on a mission to close the justice gap using technology and AI.We empower personal injury lawyers and victims to get the justice they deserve. Our products enable law firms to secure faster...Show moreLast updated: 4 days ago

Promoted

Senior Sensor Engineer

Waabi Innovation Inc.San Francisco, CA, United States

Full-time

Waabi, founded by AI pioneer and visionary Raquel Urtasun, is an AI company building the next generation of self‑driving technology. With a world class team and an innovative approach that unleashes...Show moreLast updated: 8 days ago

Promoted

Software Engineer, Model Inference

OpenAISan Francisco, CA, United States

Full-time

Our Inference team brings OpenAI's most capable research and technology to the world through our products.We empower consumers, enterprise and developers alike to use and access our start-of-the-ar...Show moreLast updated: 20 days ago