Job Title : Sr Hadoop Developer
Location : Pittsburgh, PA / Strongsville, OH - Hybrid 3 days minimum in office but may be subject to change
Duration : Long term
Potential for Contract Extension : yes
Only W2
This position is contract with the right to hire if a need becomes available. Manager will only look at candidates that are open to converting to a full time employee. Client will not sponsor work visas if the decision is made to hire the contingent worker.
Industry background : technical Team Dynamic : 12 people; BSAs, Scrum Master, Jr and Sr engineers, and DevOps. Full stack team
Roles and Responsibilities :
Optimize and maintain large-scale feature engineering jobs using PySpark, Pandas, and PyArrow on Hadoop-based infrastructure.
Refactor and modularize ML codebases to improve reusability, maintainability, and performance.
Collaborate with platform teams to manage compute capacity, resource allocation, and system updates.
Integrate with existing Model Serving Framework to support testing, deployment, and rollback of ML workflows.
Monitor and troubleshoot production ML pipelines, ensuring high reliability, low latency, and cost efficiency.
Contribute to internal Model Serving Framework by sharing insights, proposing and implementing improvements, and documenting best practices.
(Nice to Have) Experience implementing near real-time ML pipelines using Kafka and Spark Streaming for low-latency use cases. Experience with aws and the SageMaker MLOPs ecosystem.
Must Have Technical Skills :
Expert-level proficiency in Python, with strong experience in Pandas, PySpark, and PyArrow.
Expert-level proficiency in Hadoop ecosystem, distributed computing, and performance tuning.
5+ years of experience in software engineering, data engineering, or MLOps roles.
Experience with CI / CD tools and best practices in ML environments.
Experience with monitoring tools and techniques for ML pipeline health and performance.
Strong collaboration skills, especially in cross-functional environments involving platform and data science teams.
Flex Skills / Nice to Have :
Experience contributing to internal MLOps frameworks or platforms.
Familiarity with SLURM clusters or other distributed job schedulers.
Exposure to Kafka, Spark Streaming, or other real-time data processing tools.
Knowledge of model lifecycle management, including versioning, deployment, and drift detection.
Soft Skills : Strong collaboration skills, especially in cross-functional environments involving platform and data science teams.
Strong written and verbal communication skills
Education / Certifications : Bach min
Screening Questions :
Developer • Dallas, TX, US