Talent.com
Senior GenAI Algorithms Engineer Model Optimizations for Inference

Senior GenAI Algorithms Engineer Model Optimizations for Inference

NVIDIASanta Clara, CA, United States
13 hours ago
Job type
  • Full-time
Job description

NVIDIA is at the forefront of the generative AI revolution! The Algorithmic Model Optimization Team specifically focuses on optimizing generative AI models such as large language models (LLM) and diffusion models for maximal inference efficiency using techniques ranging from quantization, speculative decoding, sparsity, distillation, pruning to neural architecture search, and streamlined deployment strategies with open-sourced inference frameworks. Seeking a Senior Deep Learning Algorithms Engineer to improve innovative generative AI models like LLMs, VLMs, multimodal and diffusion models. In this role, you will design, implement, and productionize model optimization algorithms for inference and deployment on NVIDIA’s latest hardware platforms. The focus is on ease of use, compute and memory efficiency, and achieving the best accuracy–performance tradeoffs through software–hardware co-design.

Your work will span multiple layers of the AI software stack—ranging from algorithm design to integration—within NVIDIA’s ecosystem (TensorRT Model Optimizer, NeMo / Megatron, TensorRT-LLM) and open-source frameworks (PyTorch, Hugging Face, vLLM, SGLang). You may also dive deeper into GPU-level optimization, including custom kernel development with CUDA and Triton. This role offers a unique opportunity to work at the intersection of research and engineering, pushing the boundaries of large-scale AI optimization. We are looking for passionate engineers with strong foundations in both machine learning and software systems / architecture who are eager to make a broad impact across the AI stack.

What you’ll be doing :

Design and build modular, scalable model optimization software platforms that deliver exceptional user experiences while supporting diverse AI models and optimization techniques to drive widespread adoption.

Explore, develop, and integrate innovative deep learning optimization algorithms (e.g., quantization, speculative decoding, sparsity) into NVIDIA's AI software stack, e.g., TensorRT Model Optimizer, NeMo / Megatron, and TensorRT-LLM.

Deploy optimized models into leading OSS inference frameworks and contribute specialized APIs, model-level optimizations, and new features tailored to the latest NVIDIA hardware capabilities.

Partner with NVIDIA teams to deliver model optimization solutions for customer use cases, ensuring optimal end-to-end workflows and balanced accuracy-performance trade-offs.

Conduct deep GPU kernel-level profiling to identify and capitalize on hardware and software optimization opportunities (e.g., efficient attention kernels, KV cache optimization, parallelism strategies).

Drive continuous innovation in deep learning inference performance to strengthen NVIDIA platform integration and expand market adoption across the AI inference ecosystem.

What we need to see :

Master’s, PhD, or equivalent experience in Computer Science, Artificial Intelligence, Applied Mathematics, or a related field.

5+ years of relevant work or research experience in deep learning.

Strong software design skills, including debugging, performance analysis, and test development.

Proficiency in Python, PyTorch, and modern ML frameworks / tools.

Proven foundation in algorithms and programming fundamentals.

Strong written and verbal communication skills, with the ability to work both independently and collaboratively in a fast-paced environment.

Ways to stand out from the crowd :

Contributions to PyTorch, JAX, vLLM, SGLang, or other machine learning training and inference frameworks.

Hands-on experience training or fine-tuning generative AI models on large-scale GPU clusters.

Proficient in GPU architectures and compilation stacks, adept at analyzing and debugging end-to-end performance.

Familiarity with NVIDIA’s deep learning SDKs (e.g., TensorRT).

Experience developing high-performance GPU kernels for machine learning workloads using CUDA, CUTLASS, or Triton.

Increasingly known as “the AI computing company” and widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. () Are you creative, motivated, and love a challenge? If so, we want to hear from you! Come, join our model optimization group, where you can help build real-time, cost-effective computing platforms driving our success in this exciting and rapidly-growing field.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 148,000 USD - 235,750 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

You will also be eligible for equity and benefits () .

Applications for this job will be accepted at least until September 26, 2025.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Create a job alert for this search

Genai Engineer • Santa Clara, CA, United States

Related jobs
  • Promoted
  • New!
Senior Machine Learning Engineer

Senior Machine Learning Engineer

Retell AISan Francisco, CA, United States
Full-time
Retell AI is using the first principles to reimagine the call center with cutting edge voice AI.We believe voice is still the most natural way humans communicate, yet it has been trapped in outdate...Show moreLast updated: 13 hours ago
  • Promoted
Senior Machine Learning Engineer

Senior Machine Learning Engineer

Cognitiv CorpSan Mateo, CA, United States
Full-time
Are you ready to revolutionize the advertising industry?.At Cognitiv, we are not just another AdTech company-we are industry trailblazers redefining media buying with our Deep Learning Advertising ...Show moreLast updated: 30+ days ago
  • Promoted
Elasticsearch - Principal Software Engineer - Search Algorithms

Elasticsearch - Principal Software Engineer - Search Algorithms

ElasticMountain View, CA, United States
Full-time
Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale - unleashing the potential of businesses and people.The Elastic Search AI...Show moreLast updated: 30+ days ago
  • Promoted
Wireless Systems Algorithm Design Engineer

Wireless Systems Algorithm Design Engineer

AppleSunnyvale, CA, US
Full-time
At Apple, we work every single day to craft products that enrich people's lives.Do you love working on challenges that no one has solved yet? As a member of our Wireless Silicon Design group, you w...Show moreLast updated: 28 days ago
  • Promoted
  • New!
Senior Machine Learning Engineer (Modeling), Support

Senior Machine Learning Engineer (Modeling), Support

Block IncSan Francisco, CA, United States
Full-time
Get AI-powered advice on this job and more exclusive features.Block is one company built from many blocks, all united by the same purpose of economic empowerment. The blocks that form our foundation...Show moreLast updated: 13 hours ago
  • Promoted
Senior Wireless System & Algorithm Design Engineer in Saratoga

Senior Wireless System & Algorithm Design Engineer in Saratoga

Energy Jobline ZRSaratoga, CA, United States
Full-time
Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub.We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy ...Show moreLast updated: 8 days ago
  • Promoted
Senior Machine Learning Engineer

Senior Machine Learning Engineer

MercurySan Francisco, CA, United States
Full-time
Before 1965, it was extremely difficult and time-consuming to analyze complicated signals, like radio or images.You could solve it, but you had to throw a ton of compute at it.That all changed with...Show moreLast updated: 30+ days ago
  • Promoted
  • New!
LLM Algorithmic Optimization Engineer

LLM Algorithmic Optimization Engineer

NIOSan Jose, CA, United States
Full-time
NIO is a pioneer and a leading company in the premium smart electric vehicle market.Founded in November 2014, NIO's mission is to shape a joyful lifestyle. NIO aims to build a community starting wit...Show moreLast updated: 13 hours ago
  • Promoted
Senior Machine Learning Engineer (Recommendations)

Senior Machine Learning Engineer (Recommendations)

ScribdSan Francisco, CA, United States
Full-time
At Scribd (pronounced "scribbed"), our mission is to spark human curiosity.Join our team as we create a world of stories and knowledge, democratize the exchange of ideas and information, and empowe...Show moreLast updated: 30+ days ago
  • Promoted
Senior Wireless System & Algorithm Design Engineer

Senior Wireless System & Algorithm Design Engineer

eSpaceSaratoga, CA, United States
Full-time
Ready to make connectivity from space universally accessible, secure and actionable? Then you've come to the right place!. E-Space is bridging Earth and space to enable hyper-scaled deployments of I...Show moreLast updated: 8 days ago
  • Promoted
Senior Algorithm Engineer

Senior Algorithm Engineer

Cypress HCMRedwood City, CA, US
Full-time
Dynamic Bay Area startup is seeking a Wireless Location Algorithm Engineer tasked with designing, optimizing, and implementing advanced signal processing techniques and location estimation algorith...Show moreLast updated: 30+ days ago
  • Promoted
Senior Algorithm Application Engineer

Senior Algorithm Application Engineer

ASML US, LLCSan Jose, CA, United States
Full-time
ASML US, including its affiliates and subsidiaries, bring together the most creative minds in science and technology to develop lithography machines that are key to producing faster, cheaper, more ...Show moreLast updated: 5 days ago
  • Promoted
  • New!
[2026] Senior Machine Learning Engineer, Recommendation Systems - PhD Early Career

[2026] Senior Machine Learning Engineer, Recommendation Systems - PhD Early Career

RobloxSan Mateo, CA, United States
Full-time
Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all created by our global community of developers...Show moreLast updated: 13 hours ago
  • Promoted
  • New!
Sr. Algorithms Engineer, Autobidder

Sr. Algorithms Engineer, Autobidder

TeslaPalo Alto, CA, United States
Full-time
The mission of the Autobidder team is to accelerate the world's transition to sustainable energy by maximizing the value of storage and renewable assets. We achieve this by building state-of-the-art...Show moreLast updated: 13 hours ago
  • Promoted
Senior Machine Learning Engineer

Senior Machine Learning Engineer

ScribdSan Francisco, CA, United States
Full-time
At Scribd (pronounced "scribbed"), our mission is to spark human curiosity.Join our team as we create a world of stories and knowledge, democratize the exchange of ideas and information, and empowe...Show moreLast updated: 30+ days ago
  • Promoted
  • New!
Senior / Staff AI Algorithms Engineer

Senior / Staff AI Algorithms Engineer

DexterityRedwood City, CA, United States
Full-time
At Dexterity, we believe robots can positively transform the world.Our breakthrough technology frees people to do the creative, inspiring, problem-solving jobs that humans do best by enabling robot...Show moreLast updated: 13 hours ago
  • Promoted
Senior Machine Learning Engineer

Senior Machine Learning Engineer

HumaiSan Francisco, CA, United States
Full-time
Senior Machine Learning Engineer.SF or Waterloo, with ability to travel.Backed by top funds, we've raised $10M+ and are now heads down building. Join us at the cutting edge, where we're scaling gene...Show moreLast updated: 30+ days ago
  • Promoted
Senior Machine Learning Engineer - GenAI Platform

Senior Machine Learning Engineer - GenAI Platform

DatabricksSan Francisco, CA, United States
Full-time
Founded in late 2020 by a small group of machine learning engineers and researchers, Mosaic AI enables companies to securely fine-tune, train and deploy custom AI models on their own data, for maxi...Show moreLast updated: 30+ days ago