Talent.com
Data Engineer, Scientific Data Ingestion
Data Engineer, Scientific Data IngestionMithrl Inc. • San Francisco, CA, United States
Data Engineer, Scientific Data Ingestion

Data Engineer, Scientific Data Ingestion

Mithrl Inc. • San Francisco, CA, United States
30+ days ago
Job type
  • Full-time
Job description

About Mithrl

We envision a world where novel drugs and therapies reach patients in months, not years, accelerating breakthroughs that save lives.

Mithrl is building the world’s first commercially available AI Co-Scientist—a discovery engine that empowers life science teams to go from messy biological data to novel insights in minutes. Scientists ask questions in natural language, and Mithrl answers with real analysis, novel targets, and patent‑ready reports.

Our traction speaks for itself :

12X year‑over‑year revenue growth

Trusted by leading biotechs and big pharma across three continents

Driving real breakthroughs from target discovery to patient outcomes.

What you will do

Build and own an AI‑powered ingestion & normalization pipeline to import data from a wide variety of sources — unprocessed Excel / CSV uploads, lab and instrument exports, as well as processed data from internal pipelines.

Develop robust schema mapping, coercion, and conversion logic (think : units normalization, metadata standardization, variable‑name harmonization, vendor‑instrument quirks, plate‑reader formats, reference‑genome or annotation updates, batch‑effect correction, etc.).

Use LLM‑driven and classical data‑engineering tools to structure “semi‑structured” or messy tabular data — extracting metadata, inferring column roles / types, cleaning free‑text headers, fixing inconsistencies, and preparing final clean datasets.

Ensure all transformations that should only happen once (normalization, coercion, batch‑correction) execute during ingestion — so downstream analytics / the AI “Co‑Scientist” always works with clean, canonical data.

Build validation, verification, and quality‑control layers to catch ambiguous, inconsistent, or corrupt data before it enters the platform.

Collaborate with product teams, data science / bioinformatics colleagues, and infrastructure engineers to define and enforce data standards, and ensure pipeline outputs integrate cleanly into downstream analysis and storage systems.

What you bring

Must‑have

5+ years of experience in data engineering / data wrangling with real‑world tabular or semi‑structured data.

Strong fluency in Python, and data processing tools (Pandas, Polars, PyArrow, or similar).

Excellent experience dealing with messy Excel / CSV / spreadsheet‑style data — inconsistent headers, multiple sheets, mixed formats, free‑text fields — and normalizing it into clean structures.

Comfort designing and maintaining robust ETL / ELT pipelines, ideally for scientific or lab‑derived data.

Ability to combine classical data engineering with LLM‑powered data normalization / metadata extraction / cleaning.

Strong desire and ability to own the ingestion & normalization layer end‑to‑end — from raw upload → final clean dataset — with an eye for maintainability, reproducibility, and scalability.

Good communication skills; able to collaborate across teams (product, bioinformatics, infra) and translate real‑world messy data problems into robust engineering solutions.

Nice‑to‑have

Familiarity with scientific data types and “modalities” (e.g. plate‑readers, genomics metadata, time‑series, batch‑info, instrumentation outputs).

Experience with workflow orchestration tools (e.g. Nextflow, Prefect, Airflow, Dagster), or building pipeline abstractions.

Experience with cloud infrastructure and data storage (AWS S3, data lakes / warehouses, database schemas) to support multi‑tenant ingestion.

Past exposure to LLM‑based data transformation or cleansing agents — building or integrating tools that clean or structure messy data automatically.

Any background in computational biology / lab‑data / bioinformatics is a bonus — though not required.

What you will love at Mithrl

Mission‑driven impact : you’ll be the gatekeeper of data quality — ensuring that all scientific data entering Mithrl becomes clean, consistent, and analysis‑ready. You’ll have outsized influence over the reliability and trustworthiness of our entire data + AI stack.

High ownership & autonomy : this role is yours to shape. You decide how ingestion works, define the standards, build the pipelines. You’ll work closely with our product, data science, and infrastructure teams — shaping how data is ingested, stored, and exposed to end users or AI agents.

Team : Join a tight‑knit, talent‑dense team of engineers, scientists, and builders

Culture : We value consistency, clarity, and hard work. We solve hard problems through focused daily execution

Speed : We ship fast (2x / week) and improve continuously based on real user feedback

Location : Beautiful SF office with a high‑energy, in‑person culture

Benefits : Comprehensive PPO health coverage through Anthem (medical, dental, and vision) + 401(k) with top‑tier plans

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.

#J-18808-Ljbffr

Create a job alert for this search

Data Engineer Scientific Data Ingestion • San Francisco, CA, United States

Similar jobs
GenAI Forward-Deployed Engineer : Data Infra Impact

GenAI Forward-Deployed Engineer : Data Infra Impact

Scale AI • San Francisco, CA, United States
Full-time
A leading AI data company is seeking a Forward Deployed Engineer to drive impactful solutions in the advancement of AI.You will collaborate with technical customers to deliver high-quality data sol...Show more
Last updated: 30+ days ago • Promoted
Data Engineer

Data Engineer

Beghou Consulting • Emeryville, California, US
Permanent
Job Description Job Description Beghou brings over three decades of experience helping life sciences companies optimize their commercialization through strategic insight, advanced analytics, and ...Show more
Last updated: 3 hours ago • Promoted • New!
Data Lead Engineer – SFO,CA – Hybrid

Data Lead Engineer – SFO,CA – Hybrid

LEO DOES IT INC • San Francisco, CA, United States
Full-time
The Lead Engineer in the FOE POD is a senior technical leader responsible for architecting, building, and scaling next generation marketing technology solutions. This role blends deep MarTech orches...Show more
Last updated: 22 days ago • Promoted
Senior Data Engineer, Data Lake & Governance

Senior Data Engineer, Data Lake & Governance

Gridware • San Francisco, CA, United States
Full-time
Get AI-powered advice on this job and more exclusive features.Gridware is a San Francisco-based technology company dedicated to protecting and enhancing the electrical grid.We pioneered a groundbre...Show more
Last updated: 30+ days ago • Promoted
Principal Data Platform Engineer

Principal Data Platform Engineer

Harnham • San Francisco, CA, United States
Full-time
San Francisco, CA (Remote Eligible – US Only).Are you ready to lead the design and build of a world-class data platform from the ground up?. A high-growth, product-led tech company is looking for a ...Show more
Last updated: 30+ days ago • Promoted
Healthcare Data Engineer

Healthcare Data Engineer

Habitat Health • San Francisco, California, US
Full-time
Job Description Job Description At Habitat Health, we envision a world where older adults experience an independent and joyful aging journey in the comfort of their homes, enabled by access to co...Show more
Last updated: 5 days ago • Promoted
Senior Data Platform Engineer

Senior Data Platform Engineer

OneTrust • San Francisco, California, US
Full-time
Job Description Job Description Strength in Trust OneTrust's mission is to enable innovation through the responsible use of data and AI. We believe that ensuring data is trusted shouldn't slow tea...Show more
Last updated: 7 days ago • Promoted
Senior Data Engineer

Senior Data Engineer

Atomic Machines • Emeryville, California, US
Full-time
Job Description Job Description Atomic Machines is ushering in a new era of micromanufacturing with its Matter Compiler™ technology platform. This platform enables new classes of micromachines to ...Show more
Last updated: 7 days ago • Promoted
Data Engineer for Large-Scale Cyber Defense Platform

Data Engineer for Large-Scale Cyber Defense Platform

Alaris • San Francisco, CA, United States
Full-time
A cutting-edge cybersecurity firm in San Francisco is seeking a Data Engineer to design and architect their data infrastructure. In this high-impact position, you will be responsible for building sc...Show more
Last updated: 30+ days ago • Promoted
Data Engineer

Data Engineer

Contact Government Services, LLC • San Francisco, California, US
Full-time
Job Description Job Description Data Engineer Employment Type : Full-Time, Mid-level Department : Business Intelligence CGS is seeking a passionate and driven Data Engineer to support a rapidly gr...Show more
Last updated: 7 days ago • Promoted
Senior AI & Data Platform Engineer (Onsite)

Senior AI & Data Platform Engineer (Onsite)

Icon Ventures • San Francisco, CA, United States
Full-time
A leading technology firm in San Francisco is looking for a Staff AI & Data Platform Engineer to design and develop scalable machine learning infrastructure. This role involves cross-functional coll...Show more
Last updated: 30+ days ago • Promoted
Senior Data Engineer

Senior Data Engineer

Tendo • San Francisco, California, US
Full-time
Job Description Job Description As a Senior Data Engineer, you will work within the Engineering team and contribute to Tendo's strategic data engineering solutions by ingesting, transforming, and...Show more
Last updated: 7 days ago • Promoted
AI-Driven Scientific Data Ingestion Engineer

AI-Driven Scientific Data Ingestion Engineer

Mithrl Inc. • San Francisco, CA, United States
Full-time
A pioneering AI healthcare startup in San Francisco is seeking a Data Engineer to build and own an AI-powered data ingestion and normalization pipeline. The ideal candidate has over 5 years of exper...Show more
Last updated: 30+ days ago • Promoted
Data Engineer II, ShipTech Analytics

Data Engineer II, ShipTech Analytics

Amazon • San Francisco, CA, United States
Full-time
ShipTech Analytics (STA) is on a mission to revolutionize Amazon's global transportation network through data-driven innovation and artificial intelligence. Our vision is to be the central nervous s...Show more
Last updated: 23 days ago • Promoted
Data / Full Stack Engineer, Data Storage & Ingestion Consultant

Data / Full Stack Engineer, Data Storage & Ingestion Consultant

Kubelt • San Francisco, CA, United States
Full-time
At Eon, we are at the forefront of large-scale neuroscientific data collection.Our mission is to enable the safe and scalable development of brain emulation technology to empower humanity over the ...Show more
Last updated: 30+ days ago • Promoted
Data Engineer, Scientific Data Ingestion

Data Engineer, Scientific Data Ingestion

Mithrl • San Francisco, California, US
Full-time
Job Description Job Description ABOUT MITHRL We envision a world where novel drugs and therapies reach patients in months, not years, accelerating breakthroughs that save lives.Mithrl is building...Show more
Last updated: 7 days ago • Promoted
Data Science Engineer

Data Science Engineer

Lawrence Berkeley National Laboratory • Berkeley, California, United States
Full-time
Lawrence Berkeley National Laboratory is hiring a Data Science Engineer within the Scientific Data division.Computational Biosciences Group. CSE2) in the area of multi-modal data modeling and analys...Show more
Last updated: 9 days ago • Promoted
Senior Data Engineer, MLOps [Remote-US]

Senior Data Engineer, MLOps [Remote-US]

Quanata • San Francisco, California, US
Remote
Full-time
Job Description Job Description To help keep everyone safe, we encourage all applicants to pay close attention to protect themselves during their job search. When applying for a position online yo...Show more
Last updated: 7 days ago • Promoted