ML Performance EngineerGridmatic • Cupertino, CA, US

No longer accepting applications

ML Performance Engineer

Gridmatic • Cupertino, CA, US

19 hours ago

Job type

Full-time

Job description

Job Description

The Company

Gridmatic Inc. is a high-growth startup with offices in the Bay Area and Houston that is accelerating the clean energy transition by applying our expertise in data, machine learning, and energy to power markets. We are the rare startup that has multiple years of profitability without raising venture capital. At Gridmatic, we foster a collaborative and inclusive culture where learning and growth are constant. We move quickly, solve problems with integrity, and balance environmental responsibility with data-driven excellence.

We are looking for a Machine Learning Infrastructure Engineer to accelerate the decarbonization of the electricity system by building and optimizing the backbone of our ML platform. The ideal candidate will have solid expertise in machine learning, distributed systems and GPU-based training. They will design scalable, high-performance infrastructure for training, inference, and evaluation. They will push the boundaries of throughput and efficiency on large-scale time-series and weather datasets, while shaping the long-term vision of our ML platform. A successful candidate will thrive on continuous learning across engineering, ML systems, and energy markets, while contributing to a collaborative, mission-driven team.The ideal candidate must have strong deep learning fundamentals in addition to strong software engineering skills.

You will :

Own a significant piece of our ML platform while rapidly building and iterating scalable, robust distributed infrastructure for ML training, inference, and evaluation on large-scale time-series and weather datasets.
Optimize throughput and cost by supporting model training and deployment across multiple clusters and clouds.
Improve the efficiency of machine learning models and other workloads by optimizing latency, throughput, and memory consumption. This involves pushing the boundaries of current hardware capabilities through techniques like GPU performance engineering.
Help define the long-term vision for Gridmatic’s ML platform.
Play a key role in mentoring junior engineers and interns, contributing to a collaborative, innovative, and growth-oriented team culture.

You must be :

A strong engineer with 3+ years of full-time industry experience working on ML systems.. You possess a deep understanding of the codebases you work in and write readable, scalable code.

Experienced in optimizing GPU throughput in deep learning models.

Experienced in distributed training and inference of large models on GPU clusters, utilizing core libraries and frameworks such as PyTorch, PyTorch Lightning, and Ray.

A self-starter with a strong sense of independence and ownership, and the capability to engineer large, robust systems from the initial design and conceptualization to productionization.

Hold a Masters or Doctorate degree in engineering or a related technical field.

A mission-driven individual who is enthusiastic about working toward a renewable grid and diving into the intersection of ML and energy. No prior energy experience required, but curiosity and a willingness to learn are must-haves!

Nice to haves :

End to end proficiency in building, maintaining, and debugging cluster infrastructure, utilizing Kubernetes and Terraform.

Expertise in identifying performance bottlenecks and designing and writing high-performance code for large-scale ML workloads.

Experience with at least one of : torch.profiler, TorchDynamo, TorchInductor, Triton, or other deep learning compiler stacks.

Understanding of GPU architectures or experience with GPU kernel programming.

Knowledge of cluster communication protocols such as nccl or gloo.

Experience working with any of the following : weather data, energy systems, time-series forecasting, electricity markets, or financial trading.

#LI-DNI

Join our team and make a difference! Click below or email us at careers@gridmatic.com.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Create a job alert for this search

Performance Engineer • Cupertino, CA, US

Related jobs

Senior Performance Engineer

VirtualVocations • Fremont, California, United States

Full-time

A company is looking for a Senior Performance and Development Engineer.Key Responsibilities Build AI models, tools, and frameworks for real-time application performance metrics Develop automatio...Show more

Last updated: 30+ days ago • Promoted

High Performance AI Engineer

VirtualVocations • Fremont, California, United States

Full-time

A company is looking for a High Performance AI Engineer to build groundbreaking multi-agent systems for the CUDA ecosystem. Key Responsibilities Design, build, and optimize agentic AI systems for ...Show more

Last updated: 4 days ago • Promoted

AI / ML Software Engineer

VirtualVocations • Fremont, California, United States

Full-time

A company is looking for an AI / ML Software Engineer.Key Responsibilities Rapidly prototype product features to explore new AI capabilities while writing clean, understandable code Deliver genera...Show more

Last updated: 30+ days ago • Promoted

Senior ML Engineer

VirtualVocations • Oakland, California, United States

Full-time

A company is looking for a Senior ML Engineer - Personalisation.Key Responsibilities Develop, deploy, and iterate on scalable, real-time Next Best Action (NBA) and ranking models Design and impl...Show more

Last updated: 30+ days ago • Promoted

Engineer - AI & ML

Dish • Foster City, CA, United States

Full-time

EchoStar is reimagining the future of connectivity.Our business reach spans satellite television service, live-streaming and on-demand programming, smart home installation services, mobile plans an...Show more

Last updated: 22 hours ago • Promoted • New!

Senior Manager, ML Platform

VirtualVocations • Fremont, California, United States

Full-time

Key Responsibilities Mature and deliver a vision for the unification of ML practices across the organization Build systems that support analytics production at scale and own the deployment of ML...Show more

Last updated: 3 days ago • Promoted

Senior Manager, AI / ML Platform

VirtualVocations • Fremont, California, United States

Full-time

A company is looking for a Senior Manager, Artificial Intelligence - Machine Learning Platform.Key Responsibilities Lead the strategic direction, development, and continuous improvement of the AI...Show more

Last updated: 4 days ago • Promoted

Manager of AI Performance Analytics

VirtualVocations • Fremont, California, United States

Full-time

A company is looking for a Manager, Analytics - AI Performance.Key Responsibilities Lead the analytical vision and roadmap for AI agent performance, designing measurement strategies Define key p...Show more

Last updated: 2 days ago • Promoted

Principal ML Engineer

1010 Analog Devices Inc. • Rio Robles, CA, United States

Full-time +1

NASDAQ : ADI ) is a global semiconductor leader that bridges the physical and digital worlds to enable breakthroughs at the Intelligent Edge. ADI combines analog, digital, and software technologie...Show more

Last updated: 30+ days ago • Promoted

Senior ML Systems Engineer

VirtualVocations • Santa Clara, California, United States

Full-time

A company is looking for a Senior ML Systems Engineer.Key Responsibilities Collaborate across teams to distill product requirements into actionable software requirements Lead software architectu...Show more

Last updated: 3 days ago • Promoted

Engineering Manager, Machine Learning

VirtualVocations • Fremont, California, United States

Full-time

A company is looking for an Engineering Manager, Machine Learning.Key Responsibilities Lead and mentor a team of data scientists in developing and monitoring ML models at scale Oversee the desig...Show more

Last updated: 1 day ago • Promoted

Staff Machine Learning Engineer

VirtualVocations • Fremont, California, United States

Full-time

A company is looking for a Staff Machine Learning Engineer - Wildfire.Key Responsibilities Architect and build advanced ML models to predict vegetation and fuel conditions Design and maintain da...Show more

Last updated: 30+ days ago • Promoted

ML Research Engineer, ML Systems

Scale AI, Inc. • San Francisco, CA, United States

Full-time

Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and opera...Show more

Last updated: 30+ days ago • Promoted

Software Performance Engineer

VirtualVocations • Fremont, California, United States

Full-time

A company is looking for a Software Performance Engineer.Key Responsibilities Develop and maintain custom benchmark tools and automation frameworks for bare-metal and virtualized environments Ex...Show more

Last updated: 3 days ago • Promoted

Lead Machine Learning Engineer

VirtualVocations • Fremont, California, United States

Full-time

A company is looking for a Lead Machine Learning Engineer.Key Responsibilities Develop and manage ML infrastructure for data engineering, LLM training, and deployment Architect cloud-native solu...Show more

Last updated: 30+ days ago • Promoted

Lead ML Engineer

Visa • Foster City, CA, United States

Full-time

Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show more

Last updated: 3 days ago • Promoted

MLOps Engineer

VirtualVocations • San Francisco, California, United States

Full-time

A company is looking for an MLOps / ML Platform Engineer.Key Responsibilities Design and operate ML infrastructure for high-throughput model workflows Build scalable pipelines for training and e...Show more

Last updated: 30+ days ago • Promoted

Senior MLOps Engineer

VirtualVocations • Fremont, California, United States

Full-time

A company is looking for a Senior MLOps Engineer - Personalisation.Key Responsibilities Own and evolve the end-to-end ML lifecycle, including data ingestion, feature engineering, model training, ...Show more

Last updated: 30+ days ago • Promoted