Talent.com
Senior Platform Telemetry Engineer
Senior Platform Telemetry EngineerNVIDIA • Santa Clara, CA, United States
Senior Platform Telemetry Engineer

Senior Platform Telemetry Engineer

NVIDIA • Santa Clara, CA, United States
1 day ago
Job type
  • Full-time
Job description

NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern deep learning — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company.” We are looking to grow our company and establish teams with the most thoughtful people in the world.

NVIDIA GH200 superchip provides performance and productivity required for strong scaling for HPC and generative AI workload. Scale out is inherent to the design of this massive superchip. We are looking for expert engineers to come and help design rack level solutions for next generation scaling AI supercomputing platforms.

Join us at the forefront of technological advancement.

What you will be doing :

Drive next generation fleet management solutions for scaling AI infrastructure using GPUs and Grace solution from Nvidia. Work with customers, product management and other architects to narrow down on requirements for implementation to ensure speed of light product development.

Bring up clarity on architecture for fleet health monitoring and fault-remediation solution at scale. Work with customers and other architects, understand their requirements on health monitoring, making best use of available capabilities in-band as well as out of band. Detailed architecture, do POCs to validate architecture.

Educate customers about product architecture and take feedback to make necessary changes. Write architecture specs, design documents and own end to end delivery of product by working across the teams. Do code review for the code produced because of architecture specs.

Ensure product is properly tested by working with the development team to enhance unit testing and proper test plan in place.

Drive product life cycles with QA teams to productize the code and be responsible as a product owner.

Articulate requirements as part of Jira and bug management tools and work out an end-to-end execution plan in collaboration with other managers.

Contribute to all phases of product development, from product definition, architecture, and design, through implementation, debugging, testing and early customer support.

What we need to see :

BS, MS, or PhD in EE / CS or related field of education (or equivalent experience).

5+ years hands-on coding experience

Strong knowledge of time series databases like Influxdb & Prometheus. Strong knowledge of building and consuming REST APIs (Redfish is big plus). Strong knowledge of telemetry visualization solutions like Grafana & Influx. Strong knowledge of firmware architecture, optimize firmware for low latency APIs. Strong knowledge of analyzing algorithms for time & space complexity and project system resource requirements.

Proven record of solutions for scalability

Strong and demonstrable skill in C / C++ and Python

Experience programming and debugging skills for server platforms.

Experience in SCM (e.g., Git, Perforce) and project management tools like Jira.

You should possess excellent written and oral communication skills, excellent work ethics, a great sense of teamwork, love to produce quality work and commitment to finish your tasks every single day.

You are a self-starter who loves to find creative solutions to complicated problems and hands on with coding.

Ways to stand out from the crowd :

Experience building telemetry collection & analysis engines. Experience with Redfish. Experience with notification systems like PagerDuty.

Active Open Compute (OCP) and DMTF contributor in relevant areas. Hands on with x86 or ARM system architecture.

Familiarity with Confidential Compute.

Experience with ML and multi-variable optimization techniques.

NVIDIA is considered one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you are creative and autonomous, we want to hear from you!

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 148,000 USD - 235,750 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

You will also be eligible for equity and benefits () .

Applications for this job will be accepted at least until November 21, 2025.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Create a job alert for this search

Senior Platform Engineer • Santa Clara, CA, United States

Related jobs
Senior Platform Engineer

Senior Platform Engineer

Beautiful.ai • San Francisco, CA, United States
Full-time
Be among the first 25 applicants.With a globally-distributed remote team and a San Francisco-based office, we find unique opportunities to get to know each other personally, while delivering on our...Show more
Last updated: 30+ days ago • Promoted
Platform Engineer

Platform Engineer

Cerebras • San Francisco, CA, United States
Full-time
Total Compensation (Base + Equity) $237K – $567K • Offers Equity.You’ll be one of our first platform engineers building the foundation for our fintech engineering organization as we scale from star...Show more
Last updated: 30+ days ago • Promoted
Senior Platform and EngOps Engineer - Cluster Operations

Senior Platform and EngOps Engineer - Cluster Operations

NVIDIA • Santa Clara, CA, United States
Full-time
NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern compu...Show more
Last updated: 23 hours ago • Promoted
Lead Platform Engineer

Lead Platform Engineer

TetraScience • San Francisco, CA, United States
Full-time
You’ll partner with engineering, data, and AI teams to design scalable architectures, proactively anticipate and mitigate scaling challenges, and ensure our platform remains performant, reliable, a...Show more
Last updated: 26 days ago • Promoted
Senior Platform Engineer - USDS

Senior Platform Engineer - USDS

Tik Tok • San Jose, CA, United States
Full-time
About the Team The Cyber Defense & Engineering team is missioned to run and operate security infrastructures, platforms and technologies, as well as to support cross-functional teams to protect our...Show more
Last updated: 22 hours ago • Promoted • New!
Platform Engineer

Platform Engineer

Comulate • San Francisco, CA, United States
Full-time
At Comulate, we’re transforming the insurance back office with AI.Our platform, which reinvents expensive and time‑consuming accounting processes, is the first step in our vision to unlock the hund...Show more
Last updated: 5 days ago • Promoted
Senior Platform Engineer

Senior Platform Engineer

Qualified Health • Palo Alto, CA, United States
Full-time
At Qualified Health, we're redefining what's possible with Generative AI in healthcare.Our infrastructure provides the guardrails for safe AI governance, healthcare-specific agent creation, and rea...Show more
Last updated: 23 hours ago • Promoted
Remote Senior Platform Engineer – HealthTech & Cloud

Remote Senior Platform Engineer – HealthTech & Cloud

My Expertify • San Francisco, CA, United States
Remote
Full-time
A health tech company in San Francisco is seeking a Senior Software Developer.You will enhance the platform architecture, ensuring scalability and reliability, while working with AWS and Kubernetes...Show more
Last updated: 4 hours ago • Promoted • New!
Software Engineer, Telemetry

Software Engineer, Telemetry

Nuro • Mountain View, CA, United States
Full-time
Nuro is a self-driving technology company on a mission to make autonomy accessible to all.Founded in 2016, Nuro is building the world's most scalable driver, combining cutting-edge AI with automoti...Show more
Last updated: 23 hours ago • Promoted
Senior Software Engineer, Teleoperation

Senior Software Engineer, Teleoperation

Nuro • Mountain View, CA, United States
Full-time
Nuro is a self-driving technology company on a mission to make autonomy accessible to all.Founded in 2016, Nuro is building the world's most scalable driver, combining cutting-edge AI with automoti...Show more
Last updated: 23 hours ago • Promoted
Senior Platform Engineer

Senior Platform Engineer

HyperFi • San Francisco, CA, United States
Full-time
We're building the kind of platform we always wanted to use : fast, flexible, and built for making sense of real-world complexity. Behind the scenes is a robust, event-driven architecture that connec...Show more
Last updated: 3 days ago • Promoted
Senior Platform Engineer

Senior Platform Engineer

Kikoff Inc • San Francisco, CA, United States
Full-time
You'll be one of our first platform engineers building the foundation for our fintech engineering organization as we scale from startup to enterprise-grade operations. This is not a "keep the lights...Show more
Last updated: 2 days ago • Promoted
Observability Platform Engineer — Hybrid

Observability Platform Engineer — Hybrid

Retool • San Francisco, CA, United States
Full-time
A leading software development company in San Francisco seeks an experienced professional to develop observability platforms that enhance engineering productivity on mission-critical applications.T...Show more
Last updated: 9 hours ago • Promoted • New!
Telecom Engineer

Telecom Engineer

HonorVet Technologies • Pleasanton, CA, United States
Full-time
MS Teams and MS Teams Telephony migration.Proficient networking experience.Lead / Assist with the migration of Avaya Telephony to MS Teams Telephony solution. Expert with Avaya & MS Teams Telephony, i...Show more
Last updated: 30+ days ago • Promoted
Senior Platform Engineer

Senior Platform Engineer

TruckSmarter • San Francisco, CA, United States
Full-time
Logistics is one of the single largest industries in the world.Globally, logistics is an $8-$12 trillion dollar industry and in the US alone, ~$2 trillion, representing ~10% of GDP.A single percent...Show more
Last updated: 3 days ago • Promoted
Senior Platform Engineer

Senior Platform Engineer

Orchestra • San Francisco, CA, United States
Full-time
Be among the first 25 applicants.Orchestra helps companies navigate the journey from discovery to medicine by connecting science, operations, and finance in an intelligent platform.We're on a missi...Show more
Last updated: 30+ days ago • Promoted
Kubernetes Platform Engineer

Kubernetes Platform Engineer

Apple • Cupertino, CA, United States
Full-time
Do you love creating elegant solutions to highly complex challenges? Do you intrinsically see the importance in every detail? As part of our Silicon Technologies group, you'll help design and manuf...Show more
Last updated: 1 day ago • Promoted
Observability Engineer

Observability Engineer

Promote Project • San Francisco, CA, United States
Full-time
Honeycomb is the only observability platform you need.Get all your data in one unified platform with limitless possibilities. RMzUuMjA4LjEyLjE2OQ== when applying to show you read the job post comple...Show more
Last updated: 19 days ago • Promoted