Talent.com
Software Engineer, Machine Learning Infrastructure

Software Engineer, Machine Learning Infrastructure

DatologyaiRedwood City, California, United States
30+ days ago
Job type
  • Full-time
Job description

About the Company

Companies want to train their own large models on their own data. The current industry standard is to train on a random sample of your data, which is inefficient at best and actively harmful to model quality at worst. There is compelling research showing that smarter data selection can train better models faster—we know because we did much of this research. Given the high costs of training, this presents a huge market opportunity. We founded DatologyAI to translate this research into tools that enable enterprise customers to identify the right data on which to train, resulting in better models for cheaper. Our team has pioneered deep learning data research, built startups, and created tools for enterprise ML.

Following our $11.65M Seed round last September, we've raised a $46M Series A led by Felicis Ventures. Our investors include Radical Ventures, Amplify Partners, Microsoft, Amazon, and notable angels like Jeff Dean, Geoff Hinton, Yann LeCun and Elad Gil. With over $57.5M in total funding, we're rapidly scaling our team and computing resources to revolutionize data curation across modalities.

Join us in pushing the boundaries of what's possible in AI! Learn more about the company here .

About the Role

We’re looking for seasoned ML Infrastructure engineers with experience designing, building, and maintaining training infrastructure for our in-house ML research and validation efforts and the core infrastructure for running the curation pipeline that we deliver to our customers. As one of our early senior hires, you will partner closely with our founders on the direction of our product and drive business-critical technical decisions.

You will contribute to developing core infrastructure components that impact our ability to deliver, scale, and deploy our product. These are key components of our stack that allow us to process customer data and apply state-of-the-art research to identify the most informative data points in large-scale datasets. You will have a broad impact on the technology, product, and our company's culture.

As an ML Infrastructure Engineer at DatologyAI, you will be responsible for :

Architect, build and maintain the infrastructure that ensures highly available GPU workloads for training-purposes

Troubleshoot and resolve issues across GPU resources, networking, OS, drivers, and cloud environments, automate detection and recovery of such issues

Design, build, and maintain the infrastructure that powers our data curation product.

Partner with researchers and engineers to bring new features and research capabilities to our customers

Ensure that our infrastructure and systems are reliable, secure, and worthy of our customers' trust.

This role is based in Redwood City, CA. We are in person 4 days a week and offer relocation assistance to new employees. We provide visa sponsorship for candidates selected for this role.

About You

There are a few specific things we’ll be looking for that will help you succeed in this role :

5+ years of experience

Have meaningful experience with leading and building production ML infrastructure and platforms that deliver on major product initiatives.

Proficiency in Python and in the most commonly used tools in the infrastructure space : Linux, Kubernetes, Terraform / Pulumi, etc

Strong knowledge of hardening cloud native and especially K8s workloads.

Experience maintaining a high-quality bar for design, correctness, and testing.

Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed

Own problems end-to-end and are willing to pick up whatever knowledge you're missing to get the job done.

We would love it if candidates have :

Experience running data-processing workloads in k8s (e.g spark on k8s)

Compensation and Benefits

At DatologyAI, we are dedicated to rewarding talent with highly competitive salary and significant equity. The salary for this position ranges from $180,000 to $250,000.

The candidate's starting pay will be determined based on job-related skills, experience, qualifications, and interview performance.

We also offer a comprehensive benefits package to support our employees' well-being and professional growth :

100% covered health benefits (medical, vision, and dental).

401(k) plan with a generous 4% company match.

Unlimited paid time off (PTO) policy.

Annual $2,000 wellness stipend.

Annual $1,000 learning and development stipend.

Daily lunches and snacks are provided in our office!

Relocation assistance for employees moving to the Bay Area.

Create a job alert for this search

Software Engineer Machine Learning • Redwood City, California, United States

Related jobs
  • Promoted
Software Engineer, Machine Learning Infrastructure

Software Engineer, Machine Learning Infrastructure

David AISan Francisco, California, United States
Full-time
David AI is the first audio data research company.We bring an R&D approach to data–developing datasets with the same rigor AI labs bring to models. Our mission is to bring AI into the real world, an...Show moreLast updated: 30+ days ago
  • Promoted
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Greylock PartnersSan Francisco, CA, United States
Full-time
Machine Learning Infrastructure Engineer — join early B2C investment to help build large-scale ML infrastructure for a cutting-edge AI-first mobile product. Founders have experience building iconic ...Show moreLast updated: 30+ days ago
  • Promoted
Senior Software Engineer - Infrastructure, Machine Learning

Senior Software Engineer - Infrastructure, Machine Learning

BatonSan Francisco, California, United States
Full-time
With $10B in freight under management, our technology reaches every part of the U.We design and ship category-defining software that enables Ryder and its 50,000+ customers—including some of the wo...Show moreLast updated: 30+ days ago
  • Promoted
  • New!
Machine Learning - Infrastructure

Machine Learning - Infrastructure

Causal LabsSan Francisco, CA, United States
Full-time
Our mission is to build causal intelligence, starting with physics models to predict and control the weather.We're building a small team driven by a deep passion and urgency to solve this civilizat...Show moreLast updated: 14 hours ago
  • Promoted
Senior Machine Learning Infrastructure Engineer

Senior Machine Learning Infrastructure Engineer

AbridgeSan Francisco, CA, United States
Full-time
Senior Machine Learning Infrastructure Engineer.Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical con...Show moreLast updated: 5 days ago
  • Promoted
Software Engineer, Machine Learning

Software Engineer, Machine Learning

GrammarlySan Francisco, California, United States
Full-time
Grammarly offers a dynamic hybrid working model for this role.This flexible approach gives team members the best of both worlds : plenty of focus time along with in-person collaboration that helps f...Show moreLast updated: 30+ days ago
  • Promoted
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

AbridgeSan Francisco, CA, United States
Full-time
Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer.Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare.Our AI‑powered platform...Show moreLast updated: 30+ days ago
  • Promoted
Senior / Staff Software Engineer, Machine Learning Infrastructure

Senior / Staff Software Engineer, Machine Learning Infrastructure

NuroMountain View, California, United States
Full-time
Nuro is a self-driving technology company on a mission to make autonomy accessible to all.Founded in 2016, Nuro is building the world’s most scalable driver, combining cutting-edge AI with automoti...Show moreLast updated: 2 days ago
  • Promoted
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Character.AIMenlo Park, CA, United States
Full-time
We’re looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research. Provide infrastructure support to our ...Show moreLast updated: 24 days ago
  • Promoted
Machine Learning Systems Engineer, Tooling & Infrastructure, Optimus

Machine Learning Systems Engineer, Tooling & Infrastructure, Optimus

TeslaPalo Alto, CA, United States
Full-time
As a Software Engineer for the Optimus team, you will build the tools and infrastructure to make and measure improvements to neural network architecture by building and automating scalable data and...Show moreLast updated: 30+ days ago
  • Promoted
Machine Learning Infrastructure Engineer

Machine Learning Infrastructure Engineer

Ambience HealthcareSan Francisco, California, United States
Full-time
Ambience is developing the most capable AI systems for healthcare and medicine.As healthcare costs soar to 17.US GDP and a projected shortage of 100,000 physicians within the next decade, the need ...Show moreLast updated: 30+ days ago
  • Promoted
Software Engineer - Machine Learning

Software Engineer - Machine Learning

CelonisRedwood City, California, United States
Full-time
We're Celonis, the global leader in Process Mining technology and one of the world's fastest-growing SaaS firms.We believe there is a massive opportunity to unlock productivity by placing data and ...Show moreLast updated: 30+ days ago
  • Promoted
Software Engineer L4, Machine Learning Platform (Metaflow)

Software Engineer L4, Machine Learning Platform (Metaflow)

NetflixLos Gatos, California, United States
Full-time
Netflix is one of the world's leading entertainment services, with 283 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and lan...Show moreLast updated: 30+ days ago
  • Promoted
Software Engineer - Machine Learning Platform

Software Engineer - Machine Learning Platform

SnowflakeMenlo Park, California, United States
Full-time
The Snowflake Machine Learning Platform team’s mission is to enable customers to bring their ML / AI workload to Snowflake. Our customers want to leverage ML / AI to extract business values from ever in...Show moreLast updated: 30+ days ago
  • Promoted
Machine Learning Infrastructure Engineers (Multiple Opportunities)

Machine Learning Infrastructure Engineers (Multiple Opportunities)

Greylock PartnersSan Francisco, CA, United States
Full-time
To help support the growth of several investments in the SF Bay Area, we’re looking to connect with talented engineers who have strong infrastructure and distributed systems backgrounds and who are...Show moreLast updated: 30+ days ago
  • Promoted
Software Engineer, Machine Learning Infrastructure

Software Engineer, Machine Learning Infrastructure

NuroMountain View, California, United States
Full-time
Nuro is a self-driving technology company on a mission to make autonomy accessible to all.Founded in 2016, Nuro is building the world’s most scalable driver, combining cutting-edge AI with automoti...Show moreLast updated: 2 days ago
  • Promoted
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

Intellipro GroupSan Francisco, California, United States
Full-time
Machine Learning Engineer, Training Infrastructure.We are looking for an ML Engineer with .ML workloads at scale, supporting our 3DVAE and video diffusion models. We encourage you to apply even if y...Show moreLast updated: 30+ days ago
  • Promoted
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

HedraSan Francisco, California, United States
Full-time
Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...Show moreLast updated: 30+ days ago