Talent.com
Software Engineer, Data Infrastructure
Software Engineer, Data InfrastructureDatologyAI • Redwood City, CA, United States
Software Engineer, Data Infrastructure

Software Engineer, Data Infrastructure

DatologyAI • Redwood City, CA, United States
30+ days ago
Job type
  • Full-time
Job description

About the Company

Models are what they eat. But a large portion of training compute is wasted training on data that are already learned, irrelevant, or even harmful, leading to worse models that cost more to train and deploy.

At DatologyAI, we've built a state of the art data curation suite to automatically curate and optimize petabytes of data to create the best possible training data for your models. Training on curated data can dramatically reduce training time and cost ( 7-40x faster training depending on the use case), dramatically increase model performance as if you had trained on >

10x more raw data without increasing the cost of training, and allow smaller models with fewer than half the parameters to outperform larger models despite using far less compute at inference time, substantially reducing the cost of deployment. For more details, check out our recent blog posts sharing our high-level results for text models and image-text models.

We raised a total of $57.5M in two rounds, a Seed and Series A. Our investors include Felicis Ventures, Radical Ventures, Amplify Partners, Microsoft, Amazon, and AI visionaries like Geoff Hinton, Yann LeCun, Jeff Dean, and many others who deeply understand the importance and difficulty of identifying and optimizing the best possible training data for models. Our team has pioneered this frontier research area and has the deep expertise on both data research and data engineering necessary to solve this incredibly challenging problem and make data curation easy for anyone who wants to train their own model on their own data.

This role is based in Redwood City, CA. We are in office 4 days a week.

About the Role

We're looking for an experienced Data Platform Engineer to join as a member of our core Datology AI team. As one of our early senior hires, you will partner closely with our founders on the direction of our product and drive business-critical technical decisions. You will lead the development of our core product and data platform. These are key components of our stack that allow us to process customer data and apply state of the art research for identifying the most informative data points in large-scale datasets. You will have a broad impact over the technology, product, and our company's culture. We provide visa sponsorship for candidates selected for this role.

What You'll Work On

  • Design, build and maintain highly scalable data processing solutions, while ensuring scalability, reliability, and security
  • Architect, build, and deploy the back-end systems and services that power our data curation platform
  • Partner with researchers and engineers to bring new features and research capabilities to our customers
  • Ensure that our systems are reliable, secure, and worthy of our customers' trust

About You

  • Have meaningful experience with leading and building production data systems to deliver on major product initiatives.
  • You have built and managed highly scalable data processing solutions (e.g. Spark, Flink), data lakes or warehouses (e.g. Snowflake, Hive), authored queries (SQL), distributed storage systems (e.g., HDFS, S3), used workflow management (e.g. Airflow, Dagster), and have experience maintaining the infra that supports these.

  • Proficiency in at least one programming language commonly used within Data Engineering, such as Python, Scala, or Java.
  • Expertise with any of ETL schedulers such as Airflow, Dagster, or similar frameworks.
  • Experience maintaining a high quality bar for design, correctness, and testing.
  • Take pride in building and operating scalable, reliable, secure systems
  • Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed
  • Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done
  • You have experience being the technical lead of a Data Engineering / Platform / Infrastructure Team.
  • Experience building ML / DL systems and / or data infrastructure that feeds into training large ML models
  • Don't meet every single requirement? We still encourage you to apply. If you're excited about our mission and eager to learn, we want to hear from you!

    Compensation

    At DatologyAI, we are dedicated to rewarding talent with highly competitive salary and significant equity. The base salary for this position ranges from $180,000 to $250,000.

  • The candidate's starting pay will be determined based on job-related skills, experience, qualifications, and interview performance.
  • We offer a comprehensive benefits package to support our employees' well-being and professional growth :

  • 100% covered health benefits (medical, vision, and dental).
  • 401(k) plan with a generous 4% company match.
  • Unlimited PTO policy
  • Annual $2,000 wellness stipend.
  • Annual $1,000 learning and development stipend.
  • Daily lunches and snacks are provided in our office!
  • Relocation assistance for employees moving to the Bay Area.
  • Create a job alert for this search

    Software Engineer Infrastructure • Redwood City, CA, United States

    Related jobs
    Senior Software Engineer - Data Transparency

    Senior Software Engineer - Data Transparency

    The Trade Desk • San Jose, CA, United States
    Full-time
    The Trade Desk is a global technology company with a mission to create a better, more open internet for everyone through principled, intelligent advertising. Handling over 1 trillion queries per day...Show more
    Last updated: 30+ days ago • Promoted
    Data Engineer (GCP Cloud)

    Data Engineer (GCP Cloud)

    Appworkshub • Santa Clara, California, United States
    Full-time
    Data Engineer (GCP Cloud) – 100% Onsite.We are seeking a highly skilled.This role requires designing and building scalable data solutions that support business intelligence, analytics, and machine ...Show more
    Last updated: 3 days ago • Promoted
    MTS, Data Infrastructure Engineer

    MTS, Data Infrastructure Engineer

    Delphina • San Francisco, California, United States
    Full-time
    Today’s Data Scientists are in pain - spending their time manually wrangling data, building models through slow trial and error, taking on painstaking rewrites for deployment, and dealing with coun...Show more
    Last updated: 30+ days ago • Promoted
    Data Infrastructure Engineer

    Data Infrastructure Engineer

    Openai • San Francisco, California, United States
    Full-time
    You’ll join the team that’s behind OpenAI’s data infrastructure that powers critical engineering, product, alignment teams that are core to the work we do at OpenAI. The systems we support include o...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer (Infrastructure)

    Software Engineer (Infrastructure)

    Column • San Francisco, California, United States
    Full-time
    For companies building financial technology and transforming the financial services space, the biggest bottleneck to their growth and innovation is often the underlying banks and infrastructure sta...Show more
    Last updated: 30+ days ago • Promoted
    Software Infrastructure & Platform Engineer

    Software Infrastructure & Platform Engineer

    PsiQuantum • Palo Alto, CA, United States
    Full-time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...Show more
    Last updated: 30+ days ago • Promoted
    Principal Database Engineer

    Principal Database Engineer

    Informatica LLC • Redwood City, CA, United States
    Full-time
    Build Your Career at Informatica.We seek innovative thinkers who believe in the power of data to drive meaningful change. At Informatica, we welcome adventurous minds eager to solve the world's most...Show more
    Last updated: 20 days ago • Promoted
    MTS, Infrastructure Engineer

    MTS, Infrastructure Engineer

    Delphina • San Francisco, California, United States
    Full-time
    Today’s Data Scientists are in pain - spending their time manually wrangling data, building models through slow trial and error, taking on painstaking rewrites for deployment, and dealing with coun...Show more
    Last updated: 30+ days ago • Promoted
    Infrastructure Engineer

    Infrastructure Engineer

    Vibecode • San Francisco, California, United States
    Full-time
    We're democratizing software creation.Our platform lets anyone describe an idea and instantly turn it into a working application—no coding required. We're solving one of computing's fundamental chal...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer, Machine Learning Infrastructure

    Software Engineer, Machine Learning Infrastructure

    Datologyai • Redwood City, California, United States
    Full-time
    Companies want to train their own large models on their own data.The current industry standard is to train on a random sample of your data, which is inefficient at best and actively harmful to mode...Show more
    Last updated: 30+ days ago • Promoted
    ML Infrastructure Engineer

    ML Infrastructure Engineer

    Phizenix • Menlo Park, California, United States
    Full-time +1
    Menlo Park, CA | On-Site | Full-Time / Direct Hire.Client Opportunity | Through Phizenix.Phizenix, a certified minority and women-led recruiting firm, is hiring on behalf of an AI startup pioneering ...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer, Infrastructure

    Software Engineer, Infrastructure

    Matroid • Palo Alto, California, United States
    Full-time
    Matroid is a full-service computer vision company that has developed an end-to-end platform allowing enterprise customers to rapidly train and. EO, IR, X-Ray, CT, OCT, and others.Founded in 2016 by ...Show more
    Last updated: 30+ days ago • Promoted
    Senior Infrastructure Linux & DevOps Engineer

    Senior Infrastructure Linux & DevOps Engineer

    Matrix Precise, Inc. • Pleasanton, California, United States
    Full-time
    Infra Linux Engineer’s primary function will be to advance the infrastructure team from a traditional infrastructure methodology to an infrastructure as code approach. You will be responsible for ma...Show more
    Last updated: 30+ days ago • Promoted
    Infrastructure Software Engineer, Public Sector

    Infrastructure Software Engineer, Public Sector

    Scale AI, Inc. • San Francisco, CA, United States
    Full-time
    Scale AI is seeking a highly skilled and motivated.Software Engineer, AI Infrastructure & Security.Public Sector Engineering team. As a part of this team, you will play a critical role in delivering...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer, Infrastructure

    Software Engineer, Infrastructure

    Zip • San Francisco, California, United States
    Full-time
    Our co-founders started Zip in 2020 to address this seemingly intractable problem with a purpose-built platform that provides a simple, consumer-grade user experience. Within just a few short years,...Show more
    Last updated: 30+ days ago • Promoted
    Data Infrastructure Engineer

    Data Infrastructure Engineer

    zaimler • San Mateo, California, United States
    Full-time
    We’re creating the foundation for AI systems that don’t just generate, but retrieve, link, and reason over enterprise knowledge. In just over a year, we’ve begun partnering with Fortune 500 design p...Show more
    Last updated: 30+ days ago • Promoted
    Data Engineer (Onsite, US)

    Data Engineer (Onsite, US)

    Wipro Technologies • Fremont, California, United States
    Full-time
    Location : Fully Onsite in either Fremont, CA or Austin, TX.This position is a fully onsite position and not eligible for relocation. Salary : Up to $90,000 + 10% performance bonus (DOE & Geographic L...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer - Cloud Infrastructure

    Software Engineer - Cloud Infrastructure

    Specter • San Francisco, California, United States
    Full-time
    Specter is creating a software-defined “control plane” for the physical world.We are starting with protecting American businesses by granting them ubiquitous perception over their physical assets.T...Show more
    Last updated: 2 days ago • Promoted