Talent.com
Machine Learning Engineer / Senior Machine Learning Engineer
Machine Learning Engineer / Senior Machine Learning EngineerC3.ai, Inc. • Redwood City, CA, United States
Machine Learning Engineer / Senior Machine Learning Engineer

Machine Learning Engineer / Senior Machine Learning Engineer

C3.ai, Inc. • Redwood City, CA, United States
16 days ago
Job type
  • Full-time
Job description

C3 AI (NYSE : AI), is the Enterprise AI application software company. C3 AI delivers a family of fully integrated products including the C3 Agentic AI Platform, an end-to-end platform for developing, deploying, and operating enterprise AI applications, C3 AI applications, a portfolio of industry-specific SaaS enterprise AI applications that enable the digital transformation of organizations globally, and C3 Generative AI, a suite of domain-specific generative AI offerings for the enterprise.Learn more at : C3 AI

C3 AI Data Science team is dedicated to pushing the boundaries of what is possible with large-scale AI. We are seeking a hands-on Machine Learning Engineer to design, build, and operate a bespoke, next-generation research platform dedicated to training novel, large-scale foundation models far beyond conventional LLM recipes.

This is a critical systems role. You will create the orchestration, secure data pathways, and frictionless developer experience that empowers our researchers to move fast, experiment securely, and scale complex training jobs on heterogeneous GPU clusters.

Responsibilities :

We are looking for an expert who can solve infrastructure problems where off-the-shelf cloud tools are insufficient.

  • Design and manage the core research compute cluster, including node layouts, queues / partitions, preemption / fair-share policies, and multi-tenant isolation.
  • Implement secure access controls for all users and services across the cluster using Kubernetes and / or SLURM.
  • Build robust branch-to-experiment CI / CD workflows, encompassing templated job creation, config promotion, and integrated version control.
  • Implement an experiment and metrics tracking system (runs, configs, checkpoints, logs) with searchable lineage to enable frictionless cross-team collaboration and sharing.
  • Design and integrate auto-checkpointing, artifact retention, and necessary rollout / rollback mechanisms.
  • Stand up robust dataset registries, ensuring data lineage, versioning, and secure access.
  • Implement sharding, streaming, and prefetch mechanisms to support efficient TB-scale data corpora access and long-term archival with reproducible rehydration.
  • Profile NCCL / I / O hotspots, optimize training throughput (mixed precision / AMP, ZeRO / FSDP, kernel fusion, caching).
  • Harden training pipelines for scale and resilience, including checkpoint recovery, and tolerance for spot / preemptible instances.
  • Build opinionated templates, job specifications, and guardrails to ensure researchers can focus on modifying custom training code and recipes without fighting infrastructure bottlenecks.

Qualifications :

  • BS / MS in Computer Science / Electrical Engineering or equivalent deep, practical experience.
  • 5+ years of work experience (8+ years for Senior Machine Learning Engineer)
  • Proven track record building custom ML / HPC platforms for specialized research (e.g., novel model architectures, time-series, multimodal AI) where commercial cloud tools were insufficient.
  • Deep expertise with Kubernetes and / or SLURM on GPU clusters, including proficiency with containers, images, and multi-node scheduling.
  • Strong software development skills in Python and one of Go, C++, or Rust . Comfortable developing controllers / operators, high-performance services, and CLI tooling on Linux.
  • Practical, hands-on knowledge of distributed ML frameworks (PyTorch DDP / FSDP / ZeRO, DeepSpeed, or JAX / TPU) and performance profiling (NCCL, CUDA basics, I / O performance).
  • Experience with object stores, Parquet format, dataset version control, streaming / sharding techniques, and efficient artifact management for checkpoints and logs.
  • Practical experience with observability (Prometheus / Grafana / OpenTelemetry) and infra-as-code (Terraform / Helm / Ansible).
  • Preferred Qualifications :

  • Experience with high-speed networking and storage, including InfiniBand / RDMA, GPUDirect-RDMA, NVLink topology, and high-throughput file / object systems.
  • Direct experience modifying or working with K8s device plugins, custom schedulers / quotas, or SLURM internals (fair-share / preemption).
  • Expertise in implementing true reproducibility at scale : seeding, deterministic builds, environment capture, and building robust dataset & experiment lineage that guarantee re-runnability months later.
  • Experience with advanced performance work such as kernel fusion, custom CUDA operations, and fine-tuning complex FSDP / ZeRO configurations.
  • A pragmatic, product-focused approach to researcher ergonomics, demonstrated by platforms you have shipped that materially increased experiment throughput and velocity.
  • C3 AI provides excellent benefits, a competitive compensation package and generous equity plan.

    California Base Pay Range $140,000—$206,000 USD

    C3 AI is proud to be an Equal Opportunity and Affirmative Action Employer. We do not discriminate on the basis of any legally protected characteristics, including disabled and veteran status.

    Create a job alert for this search

    Senior Machine Learning Engineer • Redwood City, CA, United States

    Related jobs
    Senior Machine Learning Engineer

    Senior Machine Learning Engineer

    Cognitiv Corp • San Mateo, CA, United States
    Full-time
    Are you ready to revolutionize the advertising industry?.At Cognitiv, we are not just another AdTech company-we are industry trailblazers redefining media buying with our Deep Learning Advertising ...Show more
    Last updated: 30+ days ago • Promoted
    Senior Machine Learning Engineer

    Senior Machine Learning Engineer

    Retell AI • San Francisco, CA, United States
    Full-time
    Retell AI is using the first principles to reimagine the call center with cutting edge voice AI.We believe voice is still the most natural way humans communicate, yet it has been trapped in outdate...Show more
    Last updated: 16 days ago • Promoted
    Senior Machine Learning Engineer- Trajectory Generation

    Senior Machine Learning Engineer- Trajectory Generation

    Protingent • Hillsborough, CA, United States
    Permanent
    Senior Machine Learning Engineer- Trajectory Generation.Protingent Staffing has an exciting remote Direct Hire opportunity. Research, design, implement, optimize and deploy deep learning models that...Show more
    Last updated: 5 days ago • Promoted
    Machine Learning Engineer / Senior Machine Learning Engineer

    Machine Learning Engineer / Senior Machine Learning Engineer

    C3 AI • Redwood City, CA, United States
    Full-time
    C3 AI (NYSE : AI), is the Enterprise AI application software company.C3 AI delivers a family of fully integrated products including the C3 Agentic AI Platform, an end-to-end platform for developing,...Show more
    Last updated: 16 days ago • Promoted
    Senior Machine Learning Engineer

    Senior Machine Learning Engineer

    hireVouch • San Francisco, CA, United States
    Full-time
    Senior Machine Learning Engineer.Location : San Jose, CA (Onsite).About Mendel : Mendel AI is at the forefront of transforming healthcare through cutting-edge AI and technology.Our mission is to harn...Show more
    Last updated: 30+ days ago • Promoted
    Senior Machine Learning Engineer

    Senior Machine Learning Engineer

    RainesDev • San Francisco, CA, United States
    Full-time
    We're a leading partner in social commerce, collaborating with major athletic wear, footwear, and electronics brands to expand their influencer-driven sales channels. Having achieved significant rev...Show more
    Last updated: 4 days ago • Promoted
    Senior Machine Learning Engineer, AI Collaboration

    Senior Machine Learning Engineer, AI Collaboration

    Snowflake Computing • Menlo Park, CA, United States
    Full-time
    Snowflake is about empowering enterprises to achieve their full potential - and people too.With a culture that's all in on impact, innovation, and collaboration, Snowflake is the sweet spot for bui...Show more
    Last updated: 5 days ago • Promoted
    Senior Machine Learning Engineer

    Senior Machine Learning Engineer

    Scribd • San Francisco, CA, United States
    Full-time
    At Scribd (pronounced "scribbed"), our mission is to spark human curiosity.Join our team as we create a world of stories and knowledge, democratize the exchange of ideas and information, and empowe...Show more
    Last updated: 30+ days ago • Promoted
    Senior Machine Learning Engineer

    Senior Machine Learning Engineer

    Henpen Corporation • San Francisco, CA, United States
    Full-time
    Senior Machine Learning Engineer.San Francisco, California, USA.We're a leading partner in social commerce, collaborating with major athletic wear, footwear, and electronics brands to expand their ...Show more
    Last updated: 1 day ago • Promoted
    Senior Machine Learning Engineer

    Senior Machine Learning Engineer

    Salesforce.Com Inc • San Francisco, CA, United States
    Full-time
    To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts. Salesforce is the #1 AI CRM, where humans with age...Show more
    Last updated: 30+ days ago • Promoted
    Senior Machine Learning Engineer

    Senior Machine Learning Engineer

    Quantix Search • San Mateo, CA, United States
    Full-time
    Senior Machine Learning Engineer.San Francisco | Hybrid, 3 days / week | $200K - $280K + equity.We are excited to partner with a rapidly growing healthtech startup that has successfully raised $40M i...Show more
    Last updated: 13 days ago • Promoted
    Senior Machine Learning Engineer

    Senior Machine Learning Engineer

    Curai Health • San Francisco, CA, United States
    Full-time
    We are looking for a Senior ML Engineer to join our AI team.Our team is responsible for Curai’s ML / automation technology. Your work will have a tremendous direct impact on provider and patient-facin...Show more
    Last updated: 30+ days ago • Promoted
    Senior Machine Learning Engineer - Personalization

    Senior Machine Learning Engineer - Personalization

    Warner Bros. Discovery • San Francisco, CA, United States
    Full-time
    When we say, "the stuff dreams are made of," we're not just referring to the world of wizards, dragons and superheroes, or even to the wonders of Planet Earth. Behind WBD's vast portfolio of iconic ...Show more
    Last updated: 30+ days ago • Promoted
    Senior Machine Learning Engineer

    Senior Machine Learning Engineer

    Humai • San Francisco, CA, United States
    Full-time
    Senior Machine Learning Engineer.SF or Waterloo, with ability to travel.Backed by top funds, we've raised $10M+ and are now heads down building. Join us at the cutting edge, where we're scaling gene...Show more
    Last updated: 30+ days ago • Promoted
    Senior / Staff Machine Learning Engineer

    Senior / Staff Machine Learning Engineer

    Dexterity • Redwood City, CA, United States
    Full-time
    At Dexterity, we believe robots can positively transform the world.Our breakthrough technology frees people to do the creative, inspiring, problem-solving jobs that humans do best by enabling robot...Show more
    Last updated: 30+ days ago • Promoted
    Senior Machine Learning Engineer

    Senior Machine Learning Engineer

    Scribe • San Francisco, CA, United States
    Full-time
    We're looking for a highly motivated and skilled Senior Machine Learning Engineer to build the future of AI productivity software. You will work on cutting-edge research, deploy state-of-the-art pro...Show more
    Last updated: 30+ days ago • Promoted
    Senior Machine Learning Engineer - GenAI Platform

    Senior Machine Learning Engineer - GenAI Platform

    Databricks • San Francisco, CA, United States
    Full-time
    Founded in late 2020 by a small group of machine learning engineers and researchers, Mosaic AI enables companies to securely fine-tune, train and deploy custom AI models on their own data, for maxi...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software Engineer - Machine Learning

    Senior Software Engineer - Machine Learning

    Cerebras • San Francisco, CA, United States
    Full-time
    We are seeking a Senior Machine Learning Engineer to join our team.This role will focus on developing and maintaining machine learning infrastructure and operations, particularly for our cash advan...Show more
    Last updated: 5 days ago • Promoted