Senior On-Device Model Inference Optimization EngineerNVIDIA • Santa Clara, CA, United States

Senior On-Device Model Inference Optimization Engineer

NVIDIA • Santa Clara, CA, United States

1 day ago

Job type

Full-time

Job description

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.

We are seeking a highly-skilled Senior On-Device Model Inference Optimization Engineer to join our team and lead efforts in improving the performance and efficiency of AI models enabling the next generation of autonomous vehicles technology at NVIDIA!

What you'll be doing :

Develop and implement strategies to optimize AI model inference for on-device deployment.

Employ techniques like pruning, quantization, and knowledge distillation to minimize model size and computational demands.

Optimize performance-critical components using CUDA and C++.

Collaborate with multi-functional teams to align optimization efforts with hardware capabilities and deployment needs.

Benchmark inference performance, identify bottlenecks, and implement solutions.

Research and apply innovative methods for inference optimization.

Adapt models for diverse hardware platforms and operating systems with varying capabilities.

Create tools to validate the accuracy and latency of deployed models at scale with minimal friction.

Recommend and implement model architecture changes to improve the accuracy-latency balance.

What we need to see :

MSc or PhD in Computer Science, Engineering, or a related field, or equivalent experience.

Over 10 years of confirmed experience specializing in model inference and optimization.

Expertise in modern machine learning frameworks, particularly PyTorch, ONNX, and TensorRT.

Proven experience in optimizing inference for transformer and convolutional architectures.

Strong programming proficiency in CUDA, Python, and C++.

In-depth knowledge of optimization techniques, including quantization, pruning, distillation, and hardware-aware neural architecture search.

Skilled in building and deploying scalable, cloud-based inference systems.

Passionate about developing efficient, production-ready solutions with a strong focus on code quality and performance.

Meticulous attention to detail, ensuring precision and reliability in safety-critical systems.

Strong collaboration and communication skills for working optimally across multidisciplinary teams.

Ways to stand out from the crowd :

Publications or industry experience in optimizing and deploying model inference at scale.

Hands-on expertise in hardware-aware optimizations and accelerators such as GPUs, TPUs, or custom ASICs.

Active contributions to open-source projects focused on inference optimization or machine learning frameworks.

Experience in designing and deploying inference pipelines for real-time or autonomous systems.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

You will also be eligible for equity and benefits () .

Applications for this job will be accepted at least until October 10, 2025.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Create a job alert for this search

Senior Engineer Model • Santa Clara, CA, United States

Related jobs

Payer Analytics Consultant

Central California Alliance for Health • Scotts Valley, CA, United States

Full-time +1

We have an opportunity to join the Alliance as a Payer Analytics Consultant in the Payment Strategy Department.There are two positions that can be filled as a Payer Analytics Consultant or Senior P...Show more

Last updated: 10 days ago • Promoted

Senior Lead AI Engineer (LLM Customization and Finetuning)

Capital One • San Jose, CA, United States

Full-time +1

Senior Lead AI Engineer (LLM Customization and Finetuning).At Capital One, we are creating responsible and reliable AI systems, changing banking for good. For years, Capital One has been an industry...Show more

Last updated: 1 day ago • Promoted

BMS Systems Integrator

University of California - Santa Cruz • Santa Cruz, CA, United States

Full-time +1

For full consideration, applicants should attach their resume and cover letter when applying for a job opening.For guidance related to the application process or if you are experiencing difficultie...Show more

Last updated: 30+ days ago • Promoted

Senior Payer Analytics Consultant

Central California Alliance for Health • Scotts Valley, CA, United States

Full-time +1

We have an opportunity to join the Alliance as a Senior Payer Analytics Consultant in the Payment Strategy Department.There are two positions that can be filled as a Senior Payer Analytics Consulta...Show more

Last updated: 30+ days ago • Promoted

AI Developer Relations

Signify Technology • Fremont, CA, US

Full-time

AI company pioneering high-quality data and research workflows for next-generation foundation models.Their platform empowers leading AI labs and Fortune 250 enterprises to build more accurate, reli...Show more

Last updated: 1 day ago • Promoted

AEM Developer

Wipro USAAvance Consulting • Sunnyvale, CA, United States

Full-time

We are looking for an experienced.Adobe Experience Manager (AEM) solutions.The ideal candidate should have strong expertise across multiple versions of AEM (6. Java / J2EE technologies, and full-stack...Show more

Last updated: 1 day ago • Promoted

AI Engineer

Ketch • San Francisco, CA, United States

Full-time

Ketch powers responsible data use for millions of people every day across leading media, retail, technology, and financial companies. We’re built on a simple belief : data privacy is a fundamental hu...Show more

Last updated: 11 days ago • Promoted

Principal AI Engineer, Intelligent Sensors

1010 Analog Devices Inc. • Rio Robles, CA, United States

Full-time +1

NASDAQ : ADI ) is a global semiconductor leader that bridges the physical and digital worlds to enable breakthroughs at the Intelligent Edge. ADI combines analog, digital, and software technologie...Show more

Last updated: 30+ days ago • Promoted

Senior Applied AI Engineer

Omada Health • South San Francisco, CA, United States

Full-time

Omada Health is on a mission to inspire and engage people in lifelong health, one step at a time.Omada Health is a digital care provider that empowers individuals to reach their health goals throug...Show more

Last updated: 18 days ago • Promoted

AI Engineer

VirtualVocations • Santa Clara, California, United States

Full-time

A company is looking for an AI Engineer.Key Responsibilities Collaborate with data scientists and ML engineers to containerize, deploy, and monitor AI / ML models Design, build, and manage cloud i...Show more

Last updated: 30+ days ago • Promoted

GenAI Engineer

Omni Inclusive • Fremont, CA, United States

Full-time

AI / ML engineering with hands-on experience in multimodal models (CLIP, BLIP, Whisper, or similar models).FAISS, Milvus, Weaviate) and embedding pipelines. Analyze the current multimodal indexing pip...Show more

Last updated: 1 day ago • Promoted

AI Prompt Engineer

Monograph • San Francisco, CA, United States

Full-time

That means you and your family.We aspire to be the best place you’ve ever worked.And that begins with amazing benefits.Our flexible time off policy supports work-life balance and acknowledges the u...Show more

Last updated: 30+ days ago • Promoted

Complex Systems Research

Carboncopies Foundation for Substrate Independent Minds INC • San Francisco, CA, United States

Full-time

The Carboncopies Foundation is an international nonprofit organization dedicated to advancing the science and technology of whole brain emulation - the process of transferring the structure and fun...Show more

Last updated: 1 day ago • Promoted

Senior AI Engineer

VirtualVocations • San Francisco, California, United States

Full-time

A company is looking for a Senior GenAI Engineer to design and implement next-generation GenAI solutions.Key Responsibilities Design and implement GenAI solutions to enhance service delivery acro...Show more

Last updated: 30+ days ago • Promoted

Applied AI Engineer Consultant

VirtualVocations • Fremont, California, United States

Full-time

A company is looking for an Applied AI Engineer Consultant.Key Responsibilities Build production-ready Intelligent Automations and Agentic AI solutions for enterprise clients Guide clients in de...Show more

Last updated: 30+ days ago • Promoted

AEM Developer

Trident Consulting • Sunnyvale, CA, United States

Full-time

Location : Sunnyvale, CA (Onsite) - No Remote.Good Knowledge of Front-end skills (HTML / JS / CSS) and preferably Angular / react. AEM Developer to implement Adobe Experience Manager solution to various ...Show more

Last updated: 1 day ago • Promoted

Cortex XSIAM Engineer

VirtualVocations • Fremont, California, United States

Full-time

A company is looking for a Cortex XSIAM Consultant to join a premier cyber security organization.Key Responsibilities Develop log ingestion strategies in collaboration with technical leads Creat...Show more

Last updated: 30+ days ago • Promoted

Applied UX Researcher – B2C & B2B (Advertiser) Products

US Tech Solutions, Inc. • San Francisco, CA, US

Full-time

Duration : 3 Months Location : San Francisco, CA (onsite in a hybrid model) Job Description : We are seeking an experienced Applied UX Researcher to join our team in San Francisco, CA, working in a hy...Show more

Last updated: 18 days ago • Promoted