Talent.com
Data Integration Scientist

Data Integration Scientist

Stark Pharma Solutions IncMA, United States
9 hours ago
Job type
  • Full-time
  • Quick Apply
Job description

Position : Data Integration Scientist

Location : Cambridge, MA (Hybrid)

Experience : 5+ years

Duration : 2 Months (Contract)

Job Summary

We're looking for a Data Integration Scientist to design and build an internal GEO-like data management system that organizes and integrates large-scale omics datasets. This role combines bioinformatics, data engineering, and data governance to create a central hub for raw and processed multi-omics data-making it accessible, searchable, and reusable across teams.

Key Responsibilities

Design and implement scalable data pipelines for ingestion, curation, and storage of omics datasets (e.g., bulk / single-cell RNA-seq, spatial omics, CyTOF).

Build and maintain a web-based data catalog or portal for dataset discovery and visualization of metadata and QC metrics.

Develop and enforce metadata standards, ontologies, and schema to ensure data consistency and interoperability.

Implement access controls and permission systems for data security and compliance.

Collaborate with IT and research teams to integrate the system with existing compute and storage infrastructure.

Work with raw data to process, manage, and organize large-scale biological datasets for internal use.

Qualifications

Education : BS (5+ years) or MS (0 3 years) in Bioinformatics, Computational Biology, Data Science, Computer Science, or a related field.

Strong programming and data engineering skills (Python, R, SQL / PostgreSQL).

Hands-on experience processing and managing omics datasets.

Experience developing or maintaining bioinformatics data pipelines.

Knowledge of metadata models, data provenance, and FAIR data principles.

Ability to collaborate effectively with cross-functional teams.

Preferred Technical Skills

R-Shiny experience for web applications or dashboards.

Experience with cloud or HPC environments (AWS, GCP, on-prem).

Familiarity with workflow orchestration tools (Nextflow, Snakemake).

Knowledge of relational and NoSQL databases (PostgreSQL, MongoDB).

Familiarity with public repositories such as GEO or SRA and their metadata standards.

Proficiency with Git for version control.

Nice-to-Have Skills

Experience with containerization tools (Docker, Singularity).

Understanding of CI / CD workflows and web application frameworks.

Exposure to single-cell or multi-omics data integration.

Experience implementing data access and permission systems linked to identity management.

Top 5 Skills Needed

Strong programming and data engineering (Python, SQL, R)

Experience with omics data management and integration

Knowledge of metadata standards and biological ontologies

Data pipeline and repository development experience

Understanding of data governance and FAIR data principles

Create a job alert for this search

Data Scientist • MA, United States