Data Engineer (Pipelines, Quality, Orchestration)Medical Device Company • Dallas, TX, US

No longer accepting applications

Data Engineer (Pipelines, Quality, Orchestration)

Medical Device Company • Dallas, TX, US

1 day ago

Job type

Job description

Job Description

About the role

You’ll build the data backbone that powers our keyword→auto-script machine. Your work

ensures reliable Semrush / Search Console ingestion, clean schemas, fast feature access, and

robust scheduling / monitoring —so models and scripts run on time, every time.

What you’ll do

Build / own connectors : Semrush API, Google Search Console, internal logs; schedule

with Airflow / Prefect.

Design schemas and tables for raw, curated, and feature layers (warehouse +

Postgres).

Implement data quality checks (freshness, completeness, duplicates, ontology

mappings) with alerts.

Stand up and tune vector infrastructure (pgvector / Pinecone) with indexing and

retention policies.

Expose clean datasets features to ML services (privacy-aware, audit-ready).

Optimize cost / perf (partitions, clustering, caching, job concurrency) and SLAs for

daily / weekly runs.

Build simple observability dashboards (job health, latency, data drift signals).

Partner with ML / NLP on retraining pipelines and with Compliance on audit

logs / versioning.

What you’ve done

3+ years as a Data Engineer (ETL / ELT in production).

Strong Python + SQL; experience with Airflow / Prefect, dbt (nice-to-have).

Worked with cloud warehouses (BigQuery / Snowflake / Redshift) and Postgres.

Built resilient API ingestions with pagination, rate limits, retries, and backfills.

Experience with data testing / validation (Great Expectations, dbt tests, or similar).

Bonus : vector DB ops, GCP / AWS, event streaming (Kafka / PubSub), healthcare data

hygiene.

How we’ll measure success (first 90 days)

Reliable daily Semrush / GSC loads with 99% on-time SLA and data quality checks.

Curated tables powering clustering / intent models with documented lineage.

Feature / embedding store online with 200ms p95 reads for model services.

Tech you’ll touch

Python, SQL, Airflow / Prefect, Postgres, Warehouse (BigQuery / Snowflake / Redshift), dbt

(optional), Great Expectations, Docker, Terraform (nice-to-have), pgvector / Pinecone.

Data Quality Engineer • Dallas, TX, US