City : Seattle,WA / Glendale, CA / Burbank, CA / Santa Monica, CA
Onsite / Hybrid / Remote : Onsite (4 days a week onsite)
Duration : 19 months
Rate Range : Up to$92.50 / hr on W2 depending on experience (no C2C or 1099 or sub-contract)
Work Authorization : GC, USC, All valid EADs except OPT, CPT, H1B
Must Have :
- Spark or Flink or Beam or Kafka Streams
- Real-time streaming pipeline experience
- SQL and large-scale schema design
- AWS or equivalent cloud platforms
- Python, Java, or Scala
- ML workflow support (feature engineering, data validation, etc.)
Responsibilities :
Design, build, and maintain scalable batch and real-time data pipelines for user interaction data, metadata, and model featuresDevelop and operate offline and real-time feature stores to support ML inference and training workflowsPartner with ML engineers to define data schemas, validation logic, and build production-ready datasetsImplement monitoring and observability for data quality and pipeline reliabilityOptimize data workflows for speed, cost, and scalability across large-scale datasetsTranslate personalization requirements into robust data infrastructure solutionsParticipate in selection and adoption of modern data tooling and cloud-native technologiesContribute to team’s technical excellence through code reviews and design discussionsQualifications :
Bachelor’s or Master’s in Computer Science, Data Engineering, or related technical field5+ years of experience building production-grade distributed data systemsExpertise in modern data frameworks (Spark, Flink, Beam, Kafka Streams)Proficient in SQL and schema design for high-scale data environmentsStrong programming skills in Python, Java, or ScalaExperience with cloud platforms such as AWS and infrastructure components like data lakes, warehouses, and feature storesProven ability to support machine learning data workflows and pipelinesPreferred :
Experience in building ML infrastructure for personalization or recommendation systemsFamiliarity with MLOps tools (MLflow, TFX, Kubeflow)Hands-on knowledge of real-time ML serving and online feature generationPrior work in early-stage or 0→1 product development environmentsStrong cross-functional collaboration with ML and product teams