The Big Data Developer will play a critical role on the Big Data engineering team, designing and implementing large-scale data processing systems that power scientific research and innovation. The ideal candidate has hands-on experience building enterprise-grade data pipelines, working with distributed systems, and optimizing data processing workflows.
This is a long-term project (12+ months) with high visibility, cutting-edge tools, and opportunities to influence technical direction.
- U.S. CITIZENSHIP OR PERMANENT RESIDENCY REQUIRED. VISA OR EAD STATUS NOT ACCEPTABLE FOR THIS POSITION. NO EXCEPTIONS
What You’ll Do
Data Pipeline Design & DevelopmentDesign, build, and deploy scalable data pipelines for ingesting, processing, transforming, and storing high-volume datasets.Implement streaming and batch-processing solutions using Hadoop, Spark, and cloud-based tools.Data Architecture & EngineeringDevelop and maintain data architecture and data flow models.Ensure data reliability, accuracy, and integrity across all environments.Support data warehousing strategies and best practices.Data Quality, Security & ComplianceImplement automated data validation, error handling, and monitoring.Ensure compliance with internal security controls and regulatory standards.Partner with governance teams to enforce data quality and security guidelines.Cross-Functional CollaborationWork closely with data scientists, analysts, product teams, and application developers.Translate business requirements into robust technical solutions.Participate in Agile ceremonies and contribute to technical design discussions.Performance OptimizationTune Spark applications, Hadoop jobs, and distributed data systems for performance and cost efficiency.Troubleshoot bottlenecks and implement improvements to system performance.Technical LeadershipProvide mentorship to junior developers and contribute to coding standards, best practices, and technical documentation.Required Skills & Qualifications
4+ years of Big Data Development experience in Hadoop ecosystems2+ years of hands-on development with Apache SparkProficiency in Java, Scala, or PythonStrong understanding of distributed systems , ETL , data warehousing , and data modeling conceptsExperience with large-scale datasets , performance tuning, and troubleshootingStrong problem-solving, communication, and collaboration skillsBachelor’s degree in Computer Science, Engineering, or related disciplinePreferred Skills
Experience working with AWS cloud services (EMR, S3, Lambda, Glue, etc.)Experience with Spark 3.x or 4.xExposure to Kubernetes, Airflow, or similar orchestration toolsFamiliarity with CI / CD and DevOps automation for data engineeringWhy This Opportunity Stands Out
Long-term project stability (12+ months, likely extension)Ability to work on high-impact scientific and research-driven datasetsHands-on cloud modernization (AWS) and next-generation big data toolingCollaborative and innovative engineering culture