Talent.com
Data Engineer (Pipelines, Quality, Orchestration)
Data Engineer (Pipelines, Quality, Orchestration)Medical Device Company • Dallas, TX, US
No longer accepting applications
Data Engineer (Pipelines, Quality, Orchestration)

Data Engineer (Pipelines, Quality, Orchestration)

Medical Device Company • Dallas, TX, US
1 day ago
Job type
  • Full-time
Job description

Job Description

Job Description

About the role

You’ll build the data backbone that powers our keyword→auto-script machine. Your work

ensures reliable Semrush / Search Console ingestion, clean schemas, fast feature access, and

robust scheduling / monitoring —so models and scripts run on time, every time.

What you’ll do

  • Build / own connectors : Semrush API, Google Search Console, internal logs; schedule

with Airflow / Prefect.

  • Design schemas and tables for raw, curated, and feature layers (warehouse +
  • Postgres).

  • Implement data quality checks (freshness, completeness, duplicates, ontology
  • mappings) with alerts.

  • Stand up and tune vector infrastructure (pgvector / Pinecone) with indexing and
  • retention policies.

  • Expose clean datasets features to ML services (privacy-aware, audit-ready).
  • Optimize cost / perf (partitions, clustering, caching, job concurrency) and SLAs for
  • daily / weekly runs.

  • Build simple observability dashboards (job health, latency, data drift signals).
  • Partner with ML / NLP on retraining pipelines and with Compliance on audit
  • logs / versioning.

    What you’ve done

  • 3+ years as a Data Engineer (ETL / ELT in production).
  • Strong Python + SQL; experience with Airflow / Prefect, dbt (nice-to-have).
  • Worked with cloud warehouses (BigQuery / Snowflake / Redshift) and Postgres.
  • Built resilient API ingestions with pagination, rate limits, retries, and backfills.
  • Experience with data testing / validation (Great Expectations, dbt tests, or similar).
  • Bonus : vector DB ops, GCP / AWS, event streaming (Kafka / PubSub), healthcare data
  • hygiene.

    How we’ll measure success (first 90 days)

  • Reliable daily Semrush / GSC loads with 99% on-time SLA and data quality checks.
  • Curated tables powering clustering / intent models with documented lineage.
  • Feature / embedding store online with 200ms p95 reads for model services.
  • Tech you’ll touch

    Python, SQL, Airflow / Prefect, Postgres, Warehouse (BigQuery / Snowflake / Redshift), dbt

    (optional), Great Expectations, Docker, Terraform (nice-to-have), pgvector / Pinecone.

    Create a job alert for this search

    Data Quality Engineer • Dallas, TX, US