Position : Data Engineer
Location : Princeton NJ
Duration : 1 Years
Mandatory Skills
Key Skills & Technologies
Programming Languages : Python (primary) SQL
Cloud Platforms : AWS (S3 Glue Lambda Redshift EC2 EMR)
Data Tools : Apache Spark Pandas PySpark Airflow
Databases : PostgreSQL MySQL NoSQL (e.g. DynamoDB)
ETL & Workflow Orchestration : AWS Glue Apache Airflow
Version Control : Git
DevOps & CI / CD : Basic understanding of CI / CD pipelines and infrastructure as code (e.g. Terraform CloudFormation)
JD
Data Pipeline Development - Design build and maintain scalable and reliable data pipelines to ingest process and transform data from various sources.
Data Integration & Management - Integrate structured and unstructured data from internal and external systems.
Ensure data quality consistency and availability across platforms.
Cloud-Based Data Engineering- Leverage AWS services (e.g. S3 Lambda Glue Redshift EMR) to build cloud-native data solutions.
Optimize cloud resources for performance and cost-efficiency.
Programming & Automation - Use Python for data manipulation ETL workflows and automation of data tasks.
Develop reusable scripts and modules for data processing.
Collaboration & Stakeholder Engagement
Work closely with data scientists analysts and business teams to understand data needs.
Translate business requirements into technical solutions.
Monitoring & Optimization - Monitor data pipelines and troubleshoot issues proactively.
Continuously improve performance scalability and reliability of data systems.
Key Skills
Apache Hive,S3,Hadoop,Redshift,Spark,AWS,Apache Pig,NoSQL,Big Data,Data Warehouse,Kafka,Scala
Employment Type : Full Time
Experience : years
Vacancy : 1
Data Engineer • Princeton, New Jersey, USA