Job Title : Model Monitoring Engineer
Position Summary :
We are seeking a Senior Model Monitoring Engineer with deep expertise in large-scale ETL, lakehouse architectures, and streaming pipelines . The ideal candidate will ensure the reliability, scalability, and observability of data platforms powering analytics and ML workloads , while implementing monitoring solutions for model performance, pipeline health, and data quality.
Key Responsibilities :
End-to-End ETL Mastery : Architect, operate, and scale petabyte-class ETL / ELT pipelines across bronze / silver / gold layers for analytics and ML workloads.
Architecture Leadership : Define multi-year data platform roadmaps , align stakeholders, and enforce standards for quality, performance, and cost efficiency .
Spark-Centric Engineering : Hands-on experience with Apache Spark (batch and structured streaming), including performance tuning, observability, and incident response.
Workflow Orchestration : Design and operate complex pipelines using Apache Airflow or equivalent orchestration tools, with robust SLA and dependency management.
Lakehouse & Storage Expertise : Work with data lakehouse technologies (Delta Lake, Apache Iceberg, Hudi) and columnar formats (Parquet, ORC).
Streaming & Messaging : Integrate high-volume telemetry or streaming data via Kafka, Kinesis, or Pulsar .
Software Engineering Rigor : Apply strong development practices in Python, Scala, or Java , including CI / CD, automated testing, and infrastructure-as-code.
Data Modeling : Design schemas and semantic layers for ETL throughput, downstream analytics, and ML feature engineering.
Cloud-Native Platforms : Deploy pipelines on AWS, GCP, or Azure using services like EMR, Databricks, Glue, Dataflow, or BigQuery.
Operational Resilience : Leverage Kubernetes, containerization, and observability stacks (Prometheus, Grafana, ELK, OpenTelemetry) for monitoring and proactive issue resolution.
Implement model monitoring and alerting solutions for pipeline failures, performance degradation, or anomalous data patterns.
Required Qualifications :
Bachelor's, Master's, or PhD in Computer Science, Data Engineering, Electrical Engineering, or related field.
8+ years of experience delivering production ETL pipelines and lakehouse datasets for large-scale systems.
Strong hands-on experience with Spark, Airflow, and streaming frameworks .
Proficiency in Python, Scala, or Java for ETL, automation, and monitoring scripts.
Familiarity with cloud-native ETL deployments and telemetry monitoring.
Experience in building operational monitoring dashboards and implementing proactive alerts for ML and ETL workloads.
Preferred Qualifications :
Experience working with large-scale telemetry data .
Familiarity with internal enterprise data platforms and standard development processes.
Experience in model performance monitoring and data quality tracking for ML workflows.
Engineer • CA, United States