Talent.com
Senior Data Engineer

Senior Data Engineer

Formation BioNew York City, New York, United States
7 days ago
Job type
  • Full-time
Job description

About Formation Bio

Formation Bio is a tech and AI driven pharma company differentiated by radically more efficient drug development.

Advancements in AI and drug discovery are creating more candidate drugs than the industry can progress because of the high cost and time of clinical trials. Recognizing that this development bottleneck may ultimately limit the number of new medicines that can reach patients, Formation Bio, founded in 2016 as TrialSpark Inc., has built technology platforms, processes, and capabilities to accelerate all aspects of drug development and clinical trials. Formation Bio partners, acquires, or in-licenses drugs from pharma companies, research organizations, and biotechs to develop programs past clinical proof of concept and beyond, ultimately helping to bring new medicines to patients. The company is backed by investors across pharma and tech, including a16z, Sequoia, Sanofi, Thrive Capital, Sam Altman, John Doerr, Spark Capital, SV Angel Growth, and others.

You can read more at the following links :

  • Our Vision for AI in Pharma
  • Our Current Drug Portfolio
  • Our Technology & Platform

At Formation Bio, our values are the driving force behind our mission to revolutionize the pharma industry. Every team and individual at the company shares these same values, and every team and individual plays a key part in our mission to bring new treatments to patients faster and more efficiently.

About the Position

As a Senior Data Engineer at Formation Bio, you will focus on building the semantic layer that makes diverse data pillars interoperable, consistent, and actionable. You’ll work across healthcare (EHR, claims, real-world data), commercial / pharma (pricing, formulary, market data), biomedical (scientific and trial data), and finance (operational and business datasets) to design models that unify disparate sources into a common language for analytics, decision-making, and AI applications.

While ingestion pipelines are part of the work, your primary responsibility will be transforming both structured and unstructured data into scalable, ontology-driven data models that teams can trust and reuse. This includes everything from traditional relational datasets to text-heavy unstructured sources that feed NLP, embeddings, and semantic search.

This role requires partnering closely with engineers, analysts, data scientists, and business stakeholders to ensure every data pillar is represented in a robust semantic foundation that supports today’s needs and tomorrow’s AI-native platforms.

Responsibilities

  • Semantic Modeling & Ontologies : Build and maintain SQL / dbt models that unify datasets across healthcare, commercial / pharma, biomedical, and finance domains, leveraging ontologies (e.g., SNOMED CT, ICD, RxNorm, HL7 FHIR, OMOP).
  • Structured + Unstructured Data Integration : Design models that handle not only structured datasets but also unstructured data sources (e.g., documents, free text, biomedical literature), preparing them for AI-driven applications.
  • Data Layer Architecture : Own and evolve the semantic layer that transforms raw data into consistent, reusable models powering analytics and advanced AI.
  • Ingestion & Integration : Contribute to pipelines that bring in data from APIs, partner feeds, flat files, and unstructured text, ensuring inputs are reliable, well-documented, and metadata-rich.
  • Data Quality & FAIR Principles : Apply FAIR principles to ensure data is traceable, interoperable, and reusable across structured and unstructured domains.
  • Cross-functional Collaboration : Partner with commercial, scientific, finance, and healthcare stakeholders to align semantic models with real-world use cases.
  • Enablement & Documentation : Document data standards and reusable modeling patterns to empower downstream teams and reduce cognitive load.
  • Future-Proofing : Anticipate how today’s semantic modeling will support tomorrow’s AI workflows such as NLP, embeddings, knowledge graphs, and retrieval-augmented generation.
  • About You

    Required Experience :

  • 5+ years of experience as a Data Engineer, Analytics Engineer, or similar role in healthcare, pharma, biotech, finance, or other highly regulated industries.
  • Deep expertise in at least one data domain (e.g., healthcare / EHR / claims, commercial / pharma, biomedical / scientific, or finance), with a track record of translating complex, domain-specific datasets into consistent and usable models.
  • Strong SQL and data modeling skills, with proven experience designing semantic or analytical layers.
  • Exposure to additional domains beyond your core area of expertise, and the ability to learn and adapt to new datasets quickly.
  • Experience working with both structured data (e.g., relational tables, APIs) and unstructured data (e.g., documents, free text, biomedical literature, healthcare notes).
  • Familiarity with healthcare / life sciences ontologies (SNOMED CT, ICD, RxNorm, LOINC, HL7 FHIR, OMOP, Mondo) and / or financial / commercial taxonomies.
  • Preferred Experience (Valued but Not Required) :

  • Hands-on experience with Snowflake, dbt, Dagster, and modern data stacks.
  • Experience with unstructured data workflows (NLP, embeddings, semantic search, knowledge graphs).
  • Understanding of regulatory and compliance considerations in healthcare, pharma, or finance.
  • Practical use of metadata management and data catalog platforms.
  • Hands-on experience structuring dbt projects with testing, quality checks, and reusable design patterns.
  • Key Attributes :

  • Curious & Investigative – Always looking deeper into how and why datasets work the way they do.
  • Structured & Methodical – Brings rigor to semantic modeling, ontology mapping, and data quality management.
  • Collaborative Partner – Works seamlessly across pillars, enabling others while owning core responsibilities.
  • Adaptable – Leverages deep domain expertise while learning quickly in unfamiliar data areas.
  • Enablement-Minded – Strives to reduce complexity for downstream users by standardizing and documenting.
  • Future-Oriented – Builds today’s models with tomorrow’s AI-native and data-driven applications in mind.
  • Formation Bio is prioritizing hiring in key hubs, primarily the New York City and Boston metro areas, with additional growth in the Research Triangle (NC) and San Francisco Bay Area. Please only apply if you reside in these locations or are willing to relocate.

    Compensation :

    The target salary range for this role is : $180,000 - $230,000.

    Salary ranges are informed by a number of factors including geographic location. The range provided includes base salary only. In addition to base salary, we offer equity, comprehensive benefits, generous perks, hybrid flexibility, and more. If this range doesn't match your expectations, please still apply because we may have something else for you.

    You will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.

    LI-hybrid

    Create a job alert for this search

    Senior Data Engineer • New York City, New York, United States

    Related jobs
    • Promoted
    Senior Data Engineer

    Senior Data Engineer

    VirtualVocationsPaterson, New Jersey, United States
    Full-time
    A company is looking for a Senior Data Engineer to join their team.Key Responsibilities Develop and maintain data pipelines using SQL, Python, dbt, and Airflow Design and optimize data models in...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Senior Backend Data Engineer

    Senior Backend Data Engineer

    VirtualVocationsFar Rockaway, New York, United States
    Full-time
    Software Engineer, Backend Data.Key Responsibilities Design and implement backend data systems and services Collaborate with cross-functional teams to define and deliver data solutions Optimize...Show moreLast updated: 22 hours ago
    • Promoted
    Sr. Data Engineer

    Sr. Data Engineer

    Argonaut Management Services, IncNew York, NY, United States
    Full-time
    Argo Group International Holdings, Inc.American National, US based specialty P&C companies, (together known as BP&C, Inc. Brookfield Wealth Solutions, Ltd.BWS"), a New York and Toronto-listed public...Show moreLast updated: 30+ days ago
    • Promoted
    Lead Data Engineer

    Lead Data Engineer

    VirtualVocationsBronx, New York, United States
    Full-time
    A company is looking for a Lead Data Engineer.Key Responsibilities Lead efforts to enhance systems, processes, and applications, ensuring timely project delivery and functionality Provide techni...Show moreLast updated: 30+ days ago
    • Promoted
    Data Engineer

    Data Engineer

    VirtualVocationsElizabeth, New Jersey, United States
    Full-time
    A company is looking for a Data Engineer.Key Responsibilities Design and build new data solutions - OLAP and OLTP Develop data workflows, pipelines, and ETL processes using cloud platform produc...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Geospatial Data Engineer

    Senior Geospatial Data Engineer

    VirtualVocationsJamaica, New York, United States
    Full-time
    A company is looking for a Senior Geospatial Data Engineer to lead the design and implementation of scalable, cloud-based geospatial data infrastructures. Key Responsibilities Architect, design, a...Show moreLast updated: 1 day ago
    • Promoted
    Data Platform Engineer

    Data Platform Engineer

    VirtualVocationsNewark, New Jersey, United States
    Full-time
    A company is looking for a Lead Software Engineer - Data Platform.Key Responsibilities Evolve the core data stack to meet the scale and latency demands of AI workloads Define and implement obser...Show moreLast updated: 30+ days ago
    • Promoted
    Staff Data Platform Engineer

    Staff Data Platform Engineer

    VirtualVocationsPaterson, New Jersey, United States
    Full-time
    A company is looking for a Staff Data Platform Engineer to design and implement scalable data solutions on an AWS-based platform. Key Responsibilities Architect and evolve the AWS-based data lake,...Show moreLast updated: 1 day ago
    • Promoted
    Senior Data Platform Engineer

    Senior Data Platform Engineer

    VirtualVocationsPaterson, New Jersey, United States
    Full-time
    A company is looking for a Senior Data Platform Engineer to join their Integrations, Data Engineering and AI team.Key Responsibilities Administer and monitor Databricks workspaces and AWS infrast...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Data Engineer with Gaming License

    Senior Data Engineer with Gaming License

    VirtualVocationsYonkers, New York, United States
    Full-time
    A company is looking for a Senior Data Engineer.Key Responsibilities Design and implement reliable, scalable systems for transforming raw data into structured formats Collaborate with engineerin...Show moreLast updated: 30+ days ago
    • Promoted
    Data Engineer III

    Data Engineer III

    VirtualVocationsJamaica, New York, United States
    Full-time
    A company is looking for a Data Engineer III who will support and enhance the technology experience for a global team.Key Responsibilities : Participate in the design and implementation of a data ...Show moreLast updated: 30+ days ago
    • Promoted
    Principal Data Engineer

    Principal Data Engineer

    VirtualVocationsYonkers, New York, United States
    Full-time
    A company is looking for a Principal Data Engineer or Architect.Key Responsibilities Lead engineering team on Snowflake DWaaS roadmap Contribute to infrastructure roadmap Create high-level arch...Show moreLast updated: 30+ days ago
    • Promoted
    Data Engineer Databricks

    Data Engineer Databricks

    Brains Workgroup, Inc.Iselin, NJ, US
    Permanent
    Our client, a major bank in Central, NJ, is looking for Senior Data Engineer and Integration Data Engineer Hybrid commute, 2 days on-site in Central NJ Locations and 3 day remote per week H1B Candi...Show moreLast updated: 15 days ago
    • Promoted
    Senior Analytics Engineer

    Senior Analytics Engineer

    VirtualVocationsElizabeth, New Jersey, United States
    Full-time
    A company is looking for a Senior Analytics Engineer to enhance and scale their analytics data ecosystem.Key Responsibilities Lead development and optimization of analytics pipelines and data mod...Show moreLast updated: 30+ days ago
    • Promoted
    AWS Data Engineer

    AWS Data Engineer

    VirtualVocationsPaterson, New Jersey, United States
    Full-time
    A company is looking for a Data Engineer.Key Responsibilities Design and implement data pipelines using AWS Glue and Iceberg Manage data storage and retrieval in Amazon Redshift Collaborate wit...Show moreLast updated: 30+ days ago
    • Promoted
    Staff Data Engineer

    Staff Data Engineer

    VirtualVocationsYonkers, New York, United States
    Full-time
    A company is looking for a Staff Data Engineer.Key Responsibilities : Develop data pipelines for ingesting large amounts of data from various sources Evolve data architecture and support current ...Show moreLast updated: 30+ days ago
    • Promoted
    AI Data Engineer

    AI Data Engineer

    VirtualVocationsFar Rockaway, New York, United States
    Full-time
    A company is looking for an AI Data Engineer to implement AI-driven data engineering practices.Key Responsibilities Ingest, profile, and reconcile data from multiple legacy and modern systems De...Show moreLast updated: 30+ days ago
    • Promoted
    DataBricks Data Engineer

    DataBricks Data Engineer

    VirtualVocationsPaterson, New Jersey, United States
    Full-time
    A company is looking for a DataBricks Data Engineer.Key Responsibilities Analyze large datasets using Databricks SQL to deliver actionable insights Develop and maintain Delta Live Pipelines for ...Show moreLast updated: 3 days ago
    • Promoted
    Lead Azure Data Engineer

    Lead Azure Data Engineer

    VirtualVocationsBronx, New York, United States
    Full-time
    A company is looking for a Lead Azure Data Engineer to design and oversee scalable cloud-based data solutions using Azure technologies. Key Responsibilities Design and develop enterprise-grade dat...Show moreLast updated: 1 day ago
    • Promoted
    Senior Software Engineer - Data Platform

    Senior Software Engineer - Data Platform

    RelativityNewark, NJ, United States
    Full-time
    Join our team as we reimagine and modernize the core of Relativity's data architecture.You'll play a pivotal role in transforming the Document Data Model (DDM)-a foundational component of our platf...Show moreLast updated: 30+ days ago