Job Title : Data Engineer – AI / ML Pipelines
Location : Seffner, FL
Work Model : Hybrid
Duration : CTH
Position Summary
The Data Engineer – AI / ML Pipelines plays a key role in designing, building, and maintaining scalable data infrastructure that powers analytics and machine learning initiatives. This position focuses on developing production-grade data pipelines that support end-to-end ML workflows—from data ingestion and transformation to feature engineering, model deployment, and monitoring.
The ideal candidate has hands-on experience working with operational systems such as Warehouse Management Systems (WMS) or ERP platforms, and is comfortable partnering closely with data scientists, ML engineers, and operational stakeholders to deliver high-quality, ML-ready datasets.
Key Responsibilities
ML-Focused Data Engineering
- Build, optimize, and maintain data pipelines specifically designed for machine learning workflows.
- Collaborate with data scientists to develop feature sets, implement data versioning, and support model training, evaluation, and retraining cycles.
- Participate in initiatives involving feature stores, model input validation, and monitoring of data quality feeding ML systems.
Data Integration from Operational Systems
Ingest, normalize, and transform data from WMS, ERP, telemetry, and other operational data sources.Model and enhance operational datasets to support real-time analytics and predictive modeling use cases.Pipeline Automation & Orchestration
Build automated, reliable, and scalable pipelines using tools such as Azure Data Factory, Airflow, or Databricks Workflows.Ensure data availability, accuracy, and timeliness across both batch and streaming systems.Data Governance & Quality
Implement validation frameworks, anomaly detection, and reconciliation processes to ensure high-quality ML inputs.Support metadata management, lineage tracking, and documentation of governed, auditable data flows.Cross-Functional Collaboration
Work closely with data scientists, ML engineers, software engineers, and business teams to gather requirements and deliver ML-ready datasets.Translate modeling and analytics needs into efficient, scalable data architecture solutions.Documentation & Mentorship
Document data flows, data mappings, and pipeline logic in a clear, reproducible format.Provide guidance and mentorship to junior engineers and analysts on ML-focused data engineering best practices.Required Qualifications
Technical Skills
Strong experience building ML-focused data pipelines, including feature engineering and model lifecycle support.Proficiency in Python, SQL, and modern data transformation tools (dbt, Spark, Delta Lake, or similar).Solid understanding of orchestrators and cloud data platforms (Azure, Databricks, etc.).Familiarity with ML operations tools such as MLflow, TFX, or equivalent frameworks.Hands-on experience working with WMS or operational / logistics data.Experience
5+ years in data engineering, with at least 2 years directly supporting AI / ML applications or teams.Experience designing and maintaining production-grade pipelines in cloud environments.Proven ability to collaborate with data scientists and translate ML requirements into scalable data solutions.Education & Credentials
Bachelor’s degree in Computer Science, Data Engineering, Data Science, or a related field (Master’s preferred).Relevant certifications are a plus (e.g., Azure AI Engineer, Databricks ML, Google Professional Data Engineer).Preferred Qualifications
Experience with real-time ingestion using Kafka, Kinesis, Event Hub, or similar.Exposure to MLOps practices and CI / CD for data pipelines.Background in logistics, warehousing, fulfillment, or similar operational domains.