Data Engineering team is responsible for designing building and maintaining the Data Lake infrastructure including ingestion pipelines storage systems and internal tooling for reliable scalable access to market data.
Key Responsibilities
Ingestion&Pipelines : Architect batchstream pipelines (Airflow Kafka dbt) for diverse structured and unstructured marked data. Provide reusable SDKs in Python and Go for internal data producers.
Storage&Modeling : Implement and tune S3 columnoriented and timeseries data storage for petabytescale analytics; own partitioning compression TTL versioning and cost optimisation.
Tooling & Libraries : Develop internal libraries for schema management data contracts validation and lineage; contribute to shared libraries and services for internal data consumers for research backtesting and real-time trading purposes.
Reliability & Observability : Embed monitoring alerting SLAs SLOs and CI / CD; champion automated testing data quality dashboards and incident runbooks.
Collaboration : Partner with Data Science QuantResearch Backend and DevOps to translate requirements into platform capabilities and evangelise best practices.
Qualifications :
Additional Information :
What we offer :
Remote Work : Yes
Employment Type : Full-time
Key Skills
Apache Hive,S3,Hadoop,Redshift,Spark,AWS,Apache Pig,NoSQL,Big Data,Data Warehouse,Kafka,Scala
Department / Functional Area : Data Engineering
Experience : years
Vacancy : 1
Engineer Engineer • New York City, New York, USA