Talent.com
Senior Software Engineer, Profiling Services
Senior Software Engineer, Profiling ServicesNvidia Corporation • Santa Clara, CA, United States
Senior Software Engineer, Profiling Services

Senior Software Engineer, Profiling Services

Nvidia Corporation • Santa Clara, CA, United States
12 days ago
Job type
  • Full-time
Job description

Overview

Are you ready to innovate GPU performance analysis for Machine Learning workloads?! Join our Developer Tools Always-On Profiling (AON) team as a Senior Software Architect, where you'll be pivotal in designing, implementing, and leading our Always-On Profiling service. This role demands deep technical expertise, a proven track record to solve ambiguous challenges, and strong technical leadership skills.

Responsibilities

  • Architect and Build Scalable Systems : Drive the design and implementation of the AON profiling service's core systems. Master inter-process communication (IPC), memory management, and low-overhead architectures to handle profiling data from complex multi-node, multi-process, multi-GPU, and cluster environments.
  • Elevate Software Engineering Excellence : Promote high standards in software development, including design patterns, concurrency, parallelism, and advanced debugging for asynchronous systems. Commit to code quality and robust testing to ensure a reliable profiling service.
  • Lead, Mentor, and Innovate : Guide and mentor engineers, provide impactful code reviews, and shape technical roadmaps. Proactively identify complex technical issues within the AON project, break them down, and craft innovative solutions. Problem-solving prowess is crucial for AON's success with ML workloads.
  • Architect and Build High-Performance Platforms : Transform user needs into clear requirements and design documents. Explore diverse approaches to problems, make well-reasoned recommendations, and lead end-to-end feature development—from planning and prototyping to implementation, testing, and customer evaluation. Hands-on development across user applications, drivers, performance counter libraries, and lower-level platform / hardware abstraction layers.
  • Collaborate Across Boundaries : Partner effectively with diverse internal and external teams. Exceptional communication and collaboration skills are key to integrating AON seamlessly into the broader profiling and ML ecosystem.

Qualifications

  • BS or MS degree or equivalent experience in Computer Engineering, Computer Science, or related degree.
  • 6+ years of meaningful software development experience in C, C++, and Python.
  • 6+ years in system software design, operating systems fundamentals, computer architectures, performance analysis, and delivering production-quality software.
  • Strong interpersonal, verbal, and written communication, demonstrating the ability to build cross-organizational partnerships and lead technical teams through complex challenges.
  • Profiling & Performance Tools Expert : Extensive knowledge of profiling technologies (sampling, tracing), overhead analysis, and diverse profiling data (CPU / GPU events, performance counters, API traces, event correlation). Familiarity with existing profiling ecosystems and their limitations is a plus.
  • GPU & CUDA Proficiency : In-depth knowledge of CUDA APIs, runtime, streams, kernels, and GPU architecture.
  • ML Ecosystem & Performance Analysis : Familiarity with ML frameworks such as PyTorch and JAX, and knowledge of performance analysis for AI training / inference applications.
  • Large-Scale System Development & Debugging : Experience developing and debugging across complex multi-layered software systems, including user mode and kernel drivers, with a proven ability to contribute to and extend substantial codebases (100s of millions of lines).
  • Proficiency in Designing APIs and Interfaces for Profiling Tools : Designs robust, flexible APIs and interfaces enabling seamless integration of profiling tools with various frameworks and custom code.
  • Mastery of Problem Simplification : A history of breaking down ill-defined problems in complex technical domains, designing effective solutions, and leading teams to implement them.
  • Ways to Stand Out

  • Pioneering Low-Overhead Profiling Systems : A track record of designing and implementing profiling systems with minimal performance impact on target workloads, especially in complex multi-process and distributed environments.
  • Deep Understanding of PyTorch Internals & CUDA Usage : A comprehensive grasp of how PyTorch uses CUDA, including tensor memory, operations, and distributed training functionalities.
  • GPU Performance Analysis & Optimization Acuity : The ability to analyze profiling data and translate it into concrete, actionable insights, particularly within CUDA and ML Frameworks like PyTorch.
  • Translating Customer Needs : Skilled at redefining customer requests into actionable use cases and requirements.
  • Strong understanding of system security principles.
  • Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 148,000 USD - 235,750 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

    You will also be eligible for equity and benefits.

    Applications for this job will be accepted at least until November 10, 2025.

    NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

    #J-18808-Ljbffr

    Create a job alert for this search

    Senior Software Engineer • Santa Clara, CA, United States

    Related jobs
    Senior Software Engineer

    Senior Software Engineer

    AppZen • San Jose, CA, United States
    Full-time
    AppZen is the leader in autonomous spend-to-pay software.Its patented artificial intelligence accurately and efficiently processes information from thousands of data sources so that organizations c...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software Engineer, Systems

    Senior Software Engineer, Systems

    Aerospike • Mountain View, CA, United States
    Full-time
    Aerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. Aerospike powers millions of transactions per second with mi...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software Engineer Full-Stack

    Senior Software Engineer Full-Stack

    Kodiak • Mountain View, CA, United States
    Full-time
    The company has developed an artificial intelligence (AI) powered technology stack purpose-built for commercial trucking and the public sector. The company delivers freight daily for its customers a...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software Engineer

    Senior Software Engineer

    Anvilogic, Inc. • Palo Alto, CA, United States
    Full-time
    Anvilogic is a Palo Alto-based AI cybersecurity startup founded in 2019 by security veterans and data scientists from Fortune 500 companies. Our mission is to democratize threat detection and huntin...Show more
    Last updated: 7 days ago • Promoted
    Senior Software Engineer - Fullstack

    Senior Software Engineer - Fullstack

    Databricks Inc. • Mountain View, CA, United States
    Full-time
    At Databricks, we are passionate about enabling data teams to solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical ...Show more
    Last updated: 12 days ago • Promoted
    Senior Software Engineer, AI Systems

    Senior Software Engineer, AI Systems

    HP IQ • Palo Alto, CA, United States
    Full-time
    HP IQ is HP's new AI innovation lab.Combining startup agility with HP's global scale, we're building intelligent technologies that redefine how the world works, creates, and collaborates.We're asse...Show more
    Last updated: 5 days ago • Promoted
    Senior Software Engineer

    Senior Software Engineer

    Anvilogic Inc • Palo Alto, CA, United States
    Full-time
    Anvilogic is a Palo Alto-based AI cybersecurity startup founded in 2019 by security veterans and data scientists from Fortune 500 companies. Our mission is to democratize threat detection and huntin...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software Engineer, Recommendation Engine System

    Senior Software Engineer, Recommendation Engine System

    Tik Tok • San Jose, CA, United States
    Full-time
    About The Team TikTok's recommendation system is at the core of our user experience, delivering personalized content to millions of users in real time. Our Recommendation Architecture Team is respon...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software Engineer (Cortex Xpanse)

    Senior Software Engineer (Cortex Xpanse)

    Palo Alto Networks • Santa Clara, CA, United States
    Full-time
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software Engineer

    Senior Software Engineer

    Applied Intuition • Sunnyvale, CA, United States
    Full-time
    Applied Intuition is the vehicle intelligence company that accelerates the global adoption of safe, AI-driven machines.Founded in 2017 and now valued at $15 billion following its recent Series F fu...Show more
    Last updated: 5 days ago • Promoted
    Senior Software Engineer - Performance

    Senior Software Engineer - Performance

    Broadcom Corporation • Palo Alto, CA, United States
    Full-time
    If you are a first time user, please create your candidate login account before you apply for a job.If you already have a Candidate Account, please Sign-In before you apply.Job Description : Senior ...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software Engineer - Fullstack

    Senior Software Engineer - Fullstack

    Databricks • Mountain View, CA, United States
    Full-time
    Our GenAI observability and quality product provides advanced monitoring and insights for GenAI systems, giving customers real-time visibility into their system's performance, along with a suite of...Show more
    Last updated: 5 days ago • Promoted
    Senior Software Engineer, Control & Calibration

    Senior Software Engineer, Control & Calibration

    PsiQuantum • Palo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software Engineer - FoundationDB

    Senior Software Engineer - FoundationDB

    Apple • Cupertino, CA, United States
    Full-time
    Imagine what we could do together.At Apple, new ideas have a way of becoming phenomenal products, services, and customer experiences very quickly. Bring passion and dedication to your job, and there...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software Engineer, Profiling Services

    Senior Software Engineer, Profiling Services

    NVIDIA • Santa Clara, CA, United States
    Full-time
    Are you ready to innovate GPU performance analysis for Machine Learning workloads?! Join our Developer Tools Always-On Profiling (AON) team as a Senior Software Architect, where you'll be pivotal i...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software Engineer, Control & Calibration

    Senior Software Engineer, Control & Calibration

    PSI Quantum • Palo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software Engineer, Hyperscale

    Senior Software Engineer, Hyperscale

    Pure Storage • Santa Clara, CA, United States
    Full-time
    We're in an unbelievably exciting area of tech and are fundamentally reshaping the data storage industry.Here, you lead with innovative thinking, grow along with us, and join the smartest team in t...Show more
    Last updated: 5 days ago • Promoted
    Senior Software Engineer (Full Stack)

    Senior Software Engineer (Full Stack)

    Oracle • Pleasanton, CA, United States
    Full-time
    Business Data Intelligence is one of the fastest growing segments of the software industry.Business Data Intelligence Tools allow analytics and executives to get the information they need to make c...Show more
    Last updated: 30+ days ago • Promoted