Talent.com
Senior Data Engineer, Data Curation

Senior Data Engineer, Data Curation

Formation BioSan Francisco, California, United States
30+ days ago
Job type
  • Full-time
Job description

About Formation Bio

Formation Bio is a tech and AI driven pharma company differentiated by radically more efficient drug development.

Advancements in AI and drug discovery are creating more candidate drugs than the industry can progress because of the high cost and time of clinical trials. Recognizing that this development bottleneck may ultimately limit the number of new medicines that can reach patients, Formation Bio, founded in 2016 as TrialSpark Inc., has built technology platforms, processes, and capabilities to accelerate all aspects of drug development and clinical trials. Formation Bio partners, acquires, or in-licenses drugs from pharma companies, research organizations, and biotechs to develop programs past clinical proof of concept and beyond, ultimately helping to bring new medicines to patients. The company is backed by investors across pharma and tech, including a16z, Sequoia, Sanofi, Thrive Capital, Sam Altman, John Doerr, Spark Capital, SV Angel Growth, and others.

You can read more at the following links :

  • Our Vision for AI in Pharma
  • Our Current Drug Portfolio
  • Our Technology & Platform

At Formation Bio, our values are the driving force behind our mission to revolutionize the pharma industry. Every team and individual at the company shares these same values, and every team and individual plays a key part in our mission to bring new treatments to patients faster and more efficiently.

About the Position

As a Senior Data Engineer at Formation Bio, you will focus on building the semantic layer that makes diverse data pillars interoperable, consistent, and actionable. You’ll work across healthcare (EHR, claims, real-world data), commercial / pharma (pricing, formulary, market data), biomedical (scientific and trial data), and finance (operational and business datasets) to design models that unify disparate sources into a common language for analytics, decision-making, and AI applications.

While ingestion pipelines are part of the work, your primary responsibility will be transforming both structured and unstructured data into scalable, ontology-driven data models that teams can trust and reuse. This includes everything from traditional relational datasets to text-heavy unstructured sources that feed NLP, embeddings, and semantic search.

This role requires partnering closely with engineers, analysts, data scientists, and business stakeholders to ensure every data pillar is represented in a robust semantic foundation that supports today’s needs and tomorrow’s AI-native platforms.

Responsibilities

  • Semantic Modeling & Ontologies : Build and maintain SQL / dbt models that unify datasets across healthcare, commercial / pharma, biomedical, and finance domains, leveraging ontologies (e.g., SNOMED CT, ICD, RxNorm, HL7 FHIR, OMOP).
  • Structured + Unstructured Data Integration : Design models that handle not only structured datasets but also unstructured data sources (e.g., documents, free text, biomedical literature), preparing them for AI-driven applications.
  • Data Layer Architecture : Own and evolve the semantic layer that transforms raw data into consistent, reusable models powering analytics and advanced AI.
  • Ingestion & Integration : Contribute to pipelines that bring in data from APIs, partner feeds, flat files, and unstructured text, ensuring inputs are reliable, well-documented, and metadata-rich.
  • Data Quality & FAIR Principles : Apply FAIR principles to ensure data is traceable, interoperable, and reusable across structured and unstructured domains.
  • Cross-functional Collaboration : Partner with commercial, scientific, finance, and healthcare stakeholders to align semantic models with real-world use cases.
  • Enablement & Documentation : Document data standards and reusable modeling patterns to empower downstream teams and reduce cognitive load.
  • Future-Proofing : Anticipate how today’s semantic modeling will support tomorrow’s AI workflows such as NLP, embeddings, knowledge graphs, and retrieval-augmented generation.
  • About You

    Required Experience :

  • 5+ years of experience as a Data Engineer, Analytics Engineer, or similar role in healthcare, pharma, biotech, finance, or other highly regulated industries.
  • Deep expertise in at least one data domain (e.g., healthcare / EHR / claims, commercial / pharma, biomedical / scientific, or finance), with a track record of translating complex, domain-specific datasets into consistent and usable models.
  • Strong SQL and data modeling skills, with proven experience designing semantic or analytical layers.
  • Exposure to additional domains beyond your core area of expertise, and the ability to learn and adapt to new datasets quickly.
  • Experience working with both structured data (e.g., relational tables, APIs) and unstructured data (e.g., documents, free text, biomedical literature, healthcare notes).
  • Familiarity with healthcare / life sciences ontologies (SNOMED CT, ICD, RxNorm, LOINC, HL7 FHIR, OMOP, Mondo) and / or financial / commercial taxonomies.
  • Preferred Experience (Valued but Not Required) :

  • Hands-on experience with Snowflake, dbt, Dagster, and modern data stacks.
  • Experience with unstructured data workflows (NLP, embeddings, semantic search, knowledge graphs).
  • Understanding of regulatory and compliance considerations in healthcare, pharma, or finance.
  • Practical use of metadata management and data catalog platforms.
  • Hands-on experience structuring dbt projects with testing, quality checks, and reusable design patterns.
  • Key Attributes :

  • Curious & Investigative – Always looking deeper into how and why datasets work the way they do.
  • Structured & Methodical – Brings rigor to semantic modeling, ontology mapping, and data quality management.
  • Collaborative Partner – Works seamlessly across pillars, enabling others while owning core responsibilities.
  • Adaptable – Leverages deep domain expertise while learning quickly in unfamiliar data areas.
  • Enablement-Minded – Strives to reduce complexity for downstream users by standardizing and documenting.
  • Future-Oriented – Builds today’s models with tomorrow’s AI-native and data-driven applications in mind.
  • Formation Bio is prioritizing hiring in key hubs, primarily the New York City and Boston metro areas, with additional growth in the Research Triangle (NC) and San Francisco Bay Area. Please only apply if you reside in these locations or are willing to relocate.

    Compensation :

    The target salary range for this role is : $180,000 - $230,000.

    Salary ranges are informed by a number of factors including geographic location. The range provided includes base salary only. In addition to base salary, we offer equity, comprehensive benefits, generous perks, hybrid flexibility, and more. If this range doesn't match your expectations, please still apply because we may have something else for you.

    You will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.

    #LI-hybrid

    Create a job alert for this search

    Senior Data Engineer • San Francisco, California, United States

    Related jobs
    • Promoted
    Senior AI & Data Platform Engineer

    Senior AI & Data Platform Engineer

    QuizletSan Francisco, CA, United States
    Full-time
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, includin...Show moreLast updated: 2 days ago
    • Promoted
    Senior Software Engineer - Data Transparency

    Senior Software Engineer - Data Transparency

    The Trade DeskSan Jose, CA, United States
    Full-time
    The Trade Desk is a global technology company with a mission to create a better, more open internet for everyone through principled, intelligent advertising. Handling over 1 trillion queries per day...Show moreLast updated: 30+ days ago
    • Promoted
    AI Incubator - Senior Data Engineer

    AI Incubator - Senior Data Engineer

    Sprinter HealthSan Francisco, CA, United States
    Full-time
    At Sprinter Health, our mission is reimagining how people access care by bringing it directly to their homes.Nearly 30% of patients in the U. For many, the ER becomes their first touchpoint with the...Show moreLast updated: 1 day ago
    • Promoted
    Databricks Data Engineer - Senior - Consulting - Location Open

    Databricks Data Engineer - Senior - Consulting - Location Open

    EYSan Mateo, CA, United States
    Full-time
    At EY, we're all in to shape your future with confidence.We'll help you succeed in a globally connected powerhouse of diverse teams and take your career wherever you want it to go.Join EY and help ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Software Engineer, Data

    Senior Software Engineer, Data

    P2PSan Francisco, CA, United States
    Full-time
    Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers th...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Backend Engineer, AI Data Platform

    Senior Backend Engineer, AI Data Platform

    LabelboxSan Francisco Bay, California, United States
    Full-time
    At Labelbox, we're building the critical infrastructure that powers breakthrough AI models at leading research labs and enterprises. Since 2018, we've been pioneering data-centric approaches that ar...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Data Engineer

    Senior Data Engineer

    The Clorox CompanyPleasanton, CA, United States
    Full-time
    Clorox is the place that's committed to growth - for our people and our brands.Guided by our purpose and values, and with people at the center of everything we do, we believe every one of us can ma...Show moreLast updated: 12 days ago
    • Promoted
    Senior AI Applications Engineer, Collaboration

    Senior AI Applications Engineer, Collaboration

    SnowflakeMenlo Park, California, United States
    Full-time
    As a Senior AI Applications Engineer on this pivotal project, you will be instrumental in designing, developing, and deploying the core AI systems that power our intelligent data matching, recommen...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Machine Learning Engineer

    Senior Machine Learning Engineer

    SukiRedwood City, California, United States
    Full-time
    The Future of Healthcare Needs You.At Suki, we’re building technology that listens, understands, and gets out of the way — so clinicians can get back to being clinicians. AI to automate clinical doc...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Data Engineer in San Francisco

    Senior Data Engineer in San Francisco

    Energy Jobline ZRSan Francisco, CA, United States
    Full-time
    Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub.We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy ...Show moreLast updated: 12 days ago
    • Promoted
    Senior ML Data Engineer

    Senior ML Data Engineer

    MidjourneySan Francisco, CA, United States
    Full-time
    Were the data team behind Midjourney's image generation models.We handle the dataset side : processing, filtering, scoring, captioning, and all the distributed compute that makes high-quality traini...Show moreLast updated: 1 day ago
    • Promoted
    Senior Data Engineer hybrid / Full Time San Francisco, CA

    Senior Data Engineer hybrid / Full Time San Francisco, CA

    AngelListSan Francisco, CA, United States
    Full-time
    We exist to accelerate innovation.We do this by giving more people the opportunity to participate in the venture economy by building the financial infrastructure that makes it possible for more peo...Show moreLast updated: 1 day ago
    • Promoted
    AI Data Engineer - Senior Manager

    AI Data Engineer - Senior Manager

    PwCSan Francisco, CA, United States
    Full-time
    At PwC, our people in business application consulting specialise in consulting services for a variety of business applications, helping clients optimise operational efficiency.These individuals ana...Show moreLast updated: 30+ days ago
    • Promoted
    Senior / Staff Machine Learning Engineer

    Senior / Staff Machine Learning Engineer

    DexterityRedwood City, California, United States
    Full-time
    At Dexterity, we believe robots can positively transform the world.Our breakthrough technology frees people to do the creative, inspiring, problem-solving jobs that humans do best by enabling robot...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Data Engineer

    Senior Data Engineer

    iO AssociatesSan Francisco, CA, United States
    Full-time
    Job Title : Senior Data Engineer.Our Client is a leading company in the technology industry, dedicated to innovation, growth, and creating a dynamic work culture. With a strong reputation for excelle...Show moreLast updated: 1 day ago
    • Promoted
    QA Engineer - AI and Data

    QA Engineer - AI and Data

    Cxapp Us, Inc.San Ramon, California, United States
    Full-time
    CXAPP is a forward-thinking technology company that leverages AI to transform industries, drive innovation and deliver cutting-edge solutions. As a QA Engineer specializing in AI and Data at CXAPP, ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Data Integration Engineer

    Senior Data Integration Engineer

    CrusoeSunnyvale, CA, United States
    Full-time
    Crusoe's mission is to accelerate the abundance of energy and intelligence.We're crafting the engine that powers a world where people can create ambitiously with AI - without sacrificing scale, spe...Show moreLast updated: 1 day ago
    • Promoted
    Senior AI & Data Platform Engineer

    Senior AI & Data Platform Engineer

    Icon VenturesSan Francisco, CA, United States
    Full-time
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, includin...Show moreLast updated: 23 hours ago