No longer accepting applications

Hadoop Developer

Indotronix Avani GroupPittsburgh, PA, US

1 day ago

Job type

Full-time

Job description

Job Description

Job Title : Sr Hadoop Developer

Location : Pittsburgh, PA / Strongsville, OH - Hybrid 3 days minimum in office but may be subject to change

Duration : Long term

Potential for Contract Extension : yes

Only W2

This position is contract with the right to hire if a need becomes available. Manager will only look at candidates that are open to converting to a full time employee. Client will not sponsor work visas if the decision is made to hire the contingent worker.

Industry background : technical Team Dynamic : 12 people; BSAs, Scrum Master, Jr and Sr engineers, and DevOps. Full stack team

Roles and Responsibilities :

Optimize and maintain large-scale feature engineering jobs using PySpark, Pandas, and PyArrow on Hadoop-based infrastructure.

Refactor and modularize ML codebases to improve reusability, maintainability, and performance.

Collaborate with platform teams to manage compute capacity, resource allocation, and system updates.

Integrate with existing Model Serving Framework to support testing, deployment, and rollback of ML workflows.

Monitor and troubleshoot production ML pipelines, ensuring high reliability, low latency, and cost efficiency.

Contribute to internal Model Serving Framework by sharing insights, proposing and implementing improvements, and documenting best practices.

(Nice to Have) Experience implementing near real-time ML pipelines using Kafka and Spark Streaming for low-latency use cases. Experience with aws and the SageMaker MLOPs ecosystem.

Must Have Technical Skills :

Expert-level proficiency in Python, with strong experience in Pandas, PySpark, and PyArrow.

Expert-level proficiency in Hadoop ecosystem, distributed computing, and performance tuning.

5+ years of experience in software engineering, data engineering, or MLOps roles.

Experience with CI / CD tools and best practices in ML environments.

Experience with monitoring tools and techniques for ML pipeline health and performance.

Strong collaboration skills, especially in cross-functional environments involving platform and data science teams.

Flex Skills / Nice to Have :

Experience contributing to internal MLOps frameworks or platforms.

Familiarity with SLURM clusters or other distributed job schedulers.

Exposure to Kafka, Spark Streaming, or other real-time data processing tools.

Knowledge of model lifecycle management, including versioning, deployment, and drift detection.

Soft Skills : Strong collaboration skills, especially in cross-functional environments involving platform and data science teams.

Strong written and verbal communication skills

Education / Certifications : Bach min

Screening Questions :

1. Describe in detail the difference between python, PySpark, Hadoop, spark, impala and when do you use each one?

2. What types of models did you deploy and how were they used by the business?

Create a job alert for this search

Developer • Pittsburgh, PA, US