Role : Sr. / Lead Data Engineer + AI
Location : Boston, MA - Remote
Experience Needed : 10 Years to 15 Years For Lead / 05 to 10 Years for Senior
Need minimum 3 years of experience as Lead.
About the role :
We're looking for a Senior Data Engineer to build and scale our lakehouse and AI data pipelines on Databricks. You'll design robust ETL / ELT, enable feature engineering for ML / LLM use cases, and drive best practices for reliability, performance, and cost.
What you'll do :
Design, build, and maintain batch / streaming pipelines in Python + PySpark on Databricks (Delta Lake, Autoloader, Structured Streaming).
Implement data models (Bronze / Silver / Gold), optimize with partitioning, Z-ORDER, and indexing, and manage reliability (DLT / Jobs, monitoring, alerting).
Enable ML / AI : feature engineering, MLflow experiment tracking, model registries, and model / feature serving; support RAG pipelines (embeddings, vector stores).
Establish data quality checks (e.g., Great Expectations), lineage, and governance (Unity Catalog, RBAC).
Collaborate with Data Science / ML and Product to productionize models and AI workflows; champion CI / CD and IaC.
Troubleshoot performance and cost issues; mentor engineers and set coding standards.
Must-have qualifications :
10+ years in data engineering with a track record of production pipelines.
Expert in Python and PySpark (UDFs, Window functions, Spark SQL, Catalyst basics).
Deep hands-on Databricks : Delta Lake, Jobs / Workflows, Structured Streaming, SQL Warehouses; practical tuning and cost optimization.
Strong SQL and data modeling (dimensional, medallion, CDC).
ML / AI enablement experience : MLflow, feature stores, model deployment / monitoring; familiarity with LLM workflows (embeddings, vectorization, prompt / response logging).
Cloud proficiency on AWS / Azure / GCP (object storage, IAM, networking).
Ai Data Engineer • MA, United States