Key Skills : Big Data, Python, PySpark, AWS, Scripting, Git.
Experience : 08+ years
Mode of Hire : Full Time
Skills Required :
Hands on experience in Big Data technologies.
Mandatory - Hands on experience in Python and PySpark. Python as a language is practically usable for anything, we are looking for application Development and Extract / Transform / Load and Data lake curation experience using Python.
Build PySpark applications using Spark Data frames in Python using Jupyter notebook and PyCharm(IDE).
Worked on optimizing spark jobs that processes huge volumes of data.
Hands on experience in version control tools like Git.
Worked on Amazon’s Analytics services like Amazon EMR, Amazon Athena, AWS Glue.
Worked on Amazon’s Compute services like Amazon Lambda, Amazon EC2 and Amazon’s Storage service like S3 and few other services like SNS.
Experience / knowledge of bash / shell scripting, PowerShell etc
Experience in designing CFTs / Terraform templates for deploying Infrastructure-as-code
Has built ETL processes to take data, copy it, structurally transform it etc. involving a wide variety of formats like CSV, TSV, XML and JSON.
Experience in working with fixed width, delimited , multi record file formats etc.
Good to have knowledge of data warehousing concepts – dimensions, facts, schemas- snowflake, star etc.
Have worked with columnar storage formats- Parquet,Avro,ORC etc. Well versed with compression techniques – Snappy, Gzip.
Good to have knowledge of AWS databases (at least one) Aurora, RDS, Redshift, ElastiCache, DynamoDB.
Understanding of foundational technologies such as IAM, core IaaS services (compute, storage, networking).