Job Description
Job Description
About the role
You’ll build the data backbone that powers our keyword→auto-script machine. Your work
ensures reliable Semrush / Search Console ingestion, clean schemas, fast feature access, and
robust scheduling / monitoring —so models and scripts run on time, every time.
What you’ll do
- Build / own connectors : Semrush API, Google Search Console, internal logs; schedule
with Airflow / Prefect.
Design schemas and tables for raw, curated, and feature layers (warehouse +Postgres).
Implement data quality checks (freshness, completeness, duplicates, ontologymappings) with alerts.
Stand up and tune vector infrastructure (pgvector / Pinecone) with indexing andretention policies.
Expose clean datasets features to ML services (privacy-aware, audit-ready).Optimize cost / perf (partitions, clustering, caching, job concurrency) and SLAs fordaily / weekly runs.
Build simple observability dashboards (job health, latency, data drift signals).Partner with ML / NLP on retraining pipelines and with Compliance on auditlogs / versioning.
What you’ve done
3+ years as a Data Engineer (ETL / ELT in production).Strong Python + SQL; experience with Airflow / Prefect, dbt (nice-to-have).Worked with cloud warehouses (BigQuery / Snowflake / Redshift) and Postgres.Built resilient API ingestions with pagination, rate limits, retries, and backfills.Experience with data testing / validation (Great Expectations, dbt tests, or similar).Bonus : vector DB ops, GCP / AWS, event streaming (Kafka / PubSub), healthcare datahygiene.
How we’ll measure success (first 90 days)
Reliable daily Semrush / GSC loads with 99% on-time SLA and data quality checks.Curated tables powering clustering / intent models with documented lineage.Feature / embedding store online with 200ms p95 reads for model services.Tech you’ll touch
Python, SQL, Airflow / Prefect, Postgres, Warehouse (BigQuery / Snowflake / Redshift), dbt
(optional), Great Expectations, Docker, Terraform (nice-to-have), pgvector / Pinecone.