Talent.com
ML Performance Engineer
ML Performance EngineerGridmatic • Cupertino, CA, US
No longer accepting applications
ML Performance Engineer

ML Performance Engineer

Gridmatic • Cupertino, CA, US
19 hours ago
Job type
  • Full-time
Job description

Job Description

Job Description

The Company

Gridmatic Inc. is a high-growth startup with offices in the Bay Area and Houston that is accelerating the clean energy transition by applying our expertise in data, machine learning, and energy to power markets. We are the rare startup that has multiple years of profitability without raising venture capital. At Gridmatic, we foster a collaborative and inclusive culture where learning and growth are constant. We move quickly, solve problems with integrity, and balance environmental responsibility with data-driven excellence.

We are looking for a Machine Learning Infrastructure Engineer to accelerate the decarbonization of the electricity system by building and optimizing the backbone of our ML platform. The ideal candidate will have solid expertise in machine learning, distributed systems and GPU-based training. They will design scalable, high-performance infrastructure for training, inference, and evaluation. They will push the boundaries of throughput and efficiency on large-scale time-series and weather datasets, while shaping the long-term vision of our ML platform. A successful candidate will thrive on continuous learning across engineering, ML systems, and energy markets, while contributing to a collaborative, mission-driven team.The ideal candidate must have strong deep learning fundamentals in addition to strong software engineering skills.

You will :

  • Own a significant piece of our ML platform while rapidly building and iterating scalable, robust distributed infrastructure for ML training, inference, and evaluation on large-scale time-series and weather datasets.
  • Optimize throughput and cost by supporting model training and deployment across multiple clusters and clouds.
  • Improve the efficiency of machine learning models and other workloads by optimizing latency, throughput, and memory consumption. This involves pushing the boundaries of current hardware capabilities through techniques like GPU performance engineering.
  • Help define the long-term vision for Gridmatic’s ML platform.
  • Play a key role in mentoring junior engineers and interns, contributing to a collaborative, innovative, and growth-oriented team culture.

You must be :

  • A strong engineer with 3+ years of full-time industry experience working on ML systems.. You possess a deep understanding of the codebases you work in and write readable, scalable code.
  • Experienced in optimizing GPU throughput in deep learning models.
  • Experienced in distributed training and inference of large models on GPU clusters, utilizing core libraries and frameworks such as PyTorch, PyTorch Lightning, and Ray.
  • A self-starter with a strong sense of independence and ownership, and the capability to engineer large, robust systems from the initial design and conceptualization to productionization.
  • Hold a Masters or Doctorate degree in engineering or a related technical field.
  • A mission-driven individual who is enthusiastic about working toward a renewable grid and diving into the intersection of ML and energy. No prior energy experience required, but curiosity and a willingness to learn are must-haves!
  • Nice to haves :

  • End to end proficiency in building, maintaining, and debugging cluster infrastructure, utilizing Kubernetes and Terraform.
  • Expertise in identifying performance bottlenecks and designing and writing high-performance code for large-scale ML workloads.
  • Experience with at least one of : torch.profiler, TorchDynamo, TorchInductor, Triton, or other deep learning compiler stacks.
  • Understanding of GPU architectures or experience with GPU kernel programming.
  • Knowledge of cluster communication protocols such as nccl or gloo.
  • Experience working with any of the following : weather data, energy systems, time-series forecasting, electricity markets, or financial trading.
  • #LI-DNI

    Join our team and make a difference! Click below or email us at careers@gridmatic.com.

    We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

    Create a job alert for this search

    Performance Engineer • Cupertino, CA, US

    Related jobs
    Senior Performance Engineer

    Senior Performance Engineer

    VirtualVocations • Fremont, California, United States
    Full-time
    A company is looking for a Senior Performance and Development Engineer.Key Responsibilities Build AI models, tools, and frameworks for real-time application performance metrics Develop automatio...Show more
    Last updated: 30+ days ago • Promoted
    High Performance AI Engineer

    High Performance AI Engineer

    VirtualVocations • Fremont, California, United States
    Full-time
    A company is looking for a High Performance AI Engineer to build groundbreaking multi-agent systems for the CUDA ecosystem. Key Responsibilities Design, build, and optimize agentic AI systems for ...Show more
    Last updated: 4 days ago • Promoted
    AI / ML Software Engineer

    AI / ML Software Engineer

    VirtualVocations • Fremont, California, United States
    Full-time
    A company is looking for an AI / ML Software Engineer.Key Responsibilities Rapidly prototype product features to explore new AI capabilities while writing clean, understandable code Deliver genera...Show more
    Last updated: 30+ days ago • Promoted
    Senior ML Engineer

    Senior ML Engineer

    VirtualVocations • Oakland, California, United States
    Full-time
    A company is looking for a Senior ML Engineer - Personalisation.Key Responsibilities Develop, deploy, and iterate on scalable, real-time Next Best Action (NBA) and ranking models Design and impl...Show more
    Last updated: 30+ days ago • Promoted
    Engineer - AI & ML

    Engineer - AI & ML

    Dish • Foster City, CA, United States
    Full-time
    EchoStar is reimagining the future of connectivity.Our business reach spans satellite television service, live-streaming and on-demand programming, smart home installation services, mobile plans an...Show more
    Last updated: 22 hours ago • Promoted • New!
    Senior Manager, ML Platform

    Senior Manager, ML Platform

    VirtualVocations • Fremont, California, United States
    Full-time
    Key Responsibilities Mature and deliver a vision for the unification of ML practices across the organization Build systems that support analytics production at scale and own the deployment of ML...Show more
    Last updated: 3 days ago • Promoted
    Senior Manager, AI / ML Platform

    Senior Manager, AI / ML Platform

    VirtualVocations • Fremont, California, United States
    Full-time
    A company is looking for a Senior Manager, Artificial Intelligence - Machine Learning Platform.Key Responsibilities Lead the strategic direction, development, and continuous improvement of the AI...Show more
    Last updated: 4 days ago • Promoted
    Manager of AI Performance Analytics

    Manager of AI Performance Analytics

    VirtualVocations • Fremont, California, United States
    Full-time
    A company is looking for a Manager, Analytics - AI Performance.Key Responsibilities Lead the analytical vision and roadmap for AI agent performance, designing measurement strategies Define key p...Show more
    Last updated: 2 days ago • Promoted
    Principal ML Engineer

    Principal ML Engineer

    1010 Analog Devices Inc. • Rio Robles, CA, United States
    Full-time +1
    NASDAQ : ADI ) is a global semiconductor leader that bridges the physical and digital worlds to enable breakthroughs at the Intelligent Edge. ADI combines analog, digital, and software technologie...Show more
    Last updated: 30+ days ago • Promoted
    Senior ML Systems Engineer

    Senior ML Systems Engineer

    VirtualVocations • Santa Clara, California, United States
    Full-time
    A company is looking for a Senior ML Systems Engineer.Key Responsibilities Collaborate across teams to distill product requirements into actionable software requirements Lead software architectu...Show more
    Last updated: 3 days ago • Promoted
    Engineering Manager, Machine Learning

    Engineering Manager, Machine Learning

    VirtualVocations • Fremont, California, United States
    Full-time
    A company is looking for an Engineering Manager, Machine Learning.Key Responsibilities Lead and mentor a team of data scientists in developing and monitoring ML models at scale Oversee the desig...Show more
    Last updated: 1 day ago • Promoted
    Staff Machine Learning Engineer

    Staff Machine Learning Engineer

    VirtualVocations • Fremont, California, United States
    Full-time
    A company is looking for a Staff Machine Learning Engineer - Wildfire.Key Responsibilities Architect and build advanced ML models to predict vegetation and fuel conditions Design and maintain da...Show more
    Last updated: 30+ days ago • Promoted
    ML Research Engineer, ML Systems

    ML Research Engineer, ML Systems

    Scale AI, Inc. • San Francisco, CA, United States
    Full-time
    Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and opera...Show more
    Last updated: 30+ days ago • Promoted
    Software Performance Engineer

    Software Performance Engineer

    VirtualVocations • Fremont, California, United States
    Full-time
    A company is looking for a Software Performance Engineer.Key Responsibilities Develop and maintain custom benchmark tools and automation frameworks for bare-metal and virtualized environments Ex...Show more
    Last updated: 3 days ago • Promoted
    Lead Machine Learning Engineer

    Lead Machine Learning Engineer

    VirtualVocations • Fremont, California, United States
    Full-time
    A company is looking for a Lead Machine Learning Engineer.Key Responsibilities Develop and manage ML infrastructure for data engineering, LLM training, and deployment Architect cloud-native solu...Show more
    Last updated: 30+ days ago • Promoted
    Lead ML Engineer

    Lead ML Engineer

    Visa • Foster City, CA, United States
    Full-time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show more
    Last updated: 3 days ago • Promoted
    MLOps Engineer

    MLOps Engineer

    VirtualVocations • San Francisco, California, United States
    Full-time
    A company is looking for an MLOps / ML Platform Engineer.Key Responsibilities Design and operate ML infrastructure for high-throughput model workflows Build scalable pipelines for training and e...Show more
    Last updated: 30+ days ago • Promoted
    Senior MLOps Engineer

    Senior MLOps Engineer

    VirtualVocations • Fremont, California, United States
    Full-time
    A company is looking for a Senior MLOps Engineer - Personalisation.Key Responsibilities Own and evolve the end-to-end ML lifecycle, including data ingestion, feature engineering, model training, ...Show more
    Last updated: 30+ days ago • Promoted