Senior Data Engineer - GC / USC only
Washington DC & New York (6-8 Months Contract) possible extensions
About the Role
We are seeking an experienced Senior Data Engineer to design, build, and optimize the data systems that power our machine learning models, segmentation analytics, and measurement pipelines.
This hands-on role will focus on building and maintaining our Feature Store, creating robust data pipelines to support training and inference workflows, and ensuring that our ML and analytics code runs efficiently and at scale across Databricks and AWS environments.
You'll collaborate closely with data scientists, ML engineers, and product teams to turn analytical concepts into production-ready data and model pipelines that drive personalization, audience targeting, and performance insights across the business.
What You'll Do
Design, build, and optimize high-performance data pipelines and feature store components using Databricks (PySpark, Delta Lake, SQL) and AWS (S3, Lambda, Glue, Kinesis, EMR).
Develop and maintain a centralized Feature Store to manage and serve machine learning features consistently across training and inference environments.
Build data pipelines for audience segmentation, measurement, and model performance tracking, ensuring accuracy and scalability.
Optimize existing code and pipelines for performance, cost efficiency, and maintainability - reducing compute time, improving Spark job performance, and minimizing data latency.
Collaborate with data scientists to productionize ML models, including model ingestion, transformation, and deployment pipelines.
Implement CI / CD workflows for data and ML deployments using Databricks Workflows, GitHub Actions, or similar automation tools.
Develop and enforce data quality, lineage, and observability frameworks (e.g., Great Expectations, Monte Carlo, Soda).
Work with cloud infrastructure teams to ensure reliable production environments and efficient resource utilization.
Contribute to code reviews, documentation, and data architecture design, promoting best practices for performance and scalability.
Stay current with Databricks, Spark, and AWS ecosystem updates to continuously improve platform efficiency.
What You'll Need
Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
7 10+ years of experience in data engineering or distributed data systems development.
Deep expertise with Databricks (PySpark, Delta Lake, SQL) and strong experience with AWS (S3, Glue, EMR, Kinesis, Lambda).
Experience designing and building Feature Stores (Databricks Feature Store, Feast, or similar).
Proven ability to profile and optimize data processing code, including Spark tuning, partitioning strategies, and efficient data I / O.
Strong programming skills in Python (preferred) or Scala / Java, with emphasis on writing performant, production-ready code.
Experience with batch and streaming pipelines, real-time data processing, and large-scale distributed computing.
Familiarity with ML model deployment and monitoring workflows (MLflow, SageMaker, custom frameworks).
Familiarity with ML model development using libraries such as scikit-learn, TensorFlow, or PyTorch.
Working knowledge of data quality frameworks, CI / CD, and infrastructure-as-code.
Excellent problem-solving and communication skills; able to collaborate across technical and product domains.
Preferred Qualifications
Experience with Databricks Unity Catalog, Delta Live Tables, and MLflow.
Understanding of segmentation, targeting, and personalization pipelines.
Experience with data observability and monitoring tools (Monte Carlo, Databand, etc.).
Familiarity with NoSQL or real-time stores (DynamoDB, Druid, Redis, etc.) for feature serving.
Exposure to containerization and orchestration (Docker, Kubernetes, Airflow, Dagster).
Strong understanding of data performance optimization principles - caching, partitioning, vectorization, and adaptive query execution.
Senior Engineer Data • Washington, DC, United States