Job Description
Our client is scaling production ML systems and needs a hands-on engineer to help build, maintain, and run essential ML data pipelines . You’ll own high-throughput data ingestion and transformation workflows (including image- and -type modalities), enforce rigorous data quality standards, and partner with research and platform teams to keep models fed with reliable, versioned datasets.
- Design, build, and operate reliable ML data pipelines for batch and / or streaming use cases across cloud environments.
- Develop robust ETL / ELT processes (ingest, validate, cleanse, transform, and publish) with clear SLAs and monitoring.
- Implement data quality gates (schema checks, null / outlier handling, drift and bias signals) and data versioning for reproducibility.
- Optimize pipelines for distributed computing and large modalities (e.g., images, multi-dimensional s).
- Automate repetitive workflows with CI / CD and infrastructure-as-code; document, test, and harden for production.
- Collaborate with ML, Data Science, and Platform teams to align datasets, features, and model training needs.
Minimum Qualifications :
5+ years building and operating data pipelines in production.
Cloud : Hands-on with AWS , Azure , or GCP services for storage, compute, orchestration, and security.Programming : Strong proficiency in Python and common data / ML libraries ( pandas , NumPy , etc.).Distributed compute : Experience with at least one of Spark , Dask , or Ray .Modalities : Experience handling image-type and -type data at scale.Automation : Proven ability to automate repetitive tasks (shell / Python scripting, CI / CD).Data Quality : Implemented validation, cleansing, and transformation frameworks in production.Data Versioning : Familiar with tools / practices such as DVC , LakeFS , or similar.Languages : Fluent in English or Farsi .Strongly PreferredSQL expertise (writing performant queries; optimizing on large datasets).Data warehousing / lakehouse concepts and tools (e.g., Snowflake / BigQuery / Redshift ; Delta / Lakehouse patterns).Data virtualization / federation exposure (e.g., Presto / Trino) and semantic / metadata layers.Orchestration (Airflow, Dagster, Prefect) and observability / monitoring for data pipelines.MLOps practices (feature stores, experiment tracking, lineage, artifacts).Containers & IaC (Docker; Terraform / CloudFormation) and CI / CD for data / ML workflows.Testing for data / ETL (unit / integration tests, great_expectations or similar).Soft Skills Executes independently and creatively ; comfortable owning outcomes in ambiguous environments.Proactive communicator who collaborates cross-functionally with DS / ML / Platform stakeholders.Location : Seattle, WA
Duration : 1+ year
Pay : $56 / hr