We are eager to attract the best, so we offer competitive compensation and a generous benefits package, including full health insurance (medical, dental and vision), 401(k), life insurance, disability and more.
What you’ll do on a typical day :
Data Pipeline Design and Development : Lead the design, implementation, and maintenance of data pipelines to support data ingestion, transformation, and storage on GCP and Snowflake.
Collaboration : Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver data solutions that meet business objectives.
Platform Optimization : Optimize and enhance the performance, scalability, and reliability of existing data pipelines, ensuring efficient data processing and storage.
Technology Stack : Stay abreast of industry trends and emerging technologies in data engineering and incorporate them into the data architecture where applicable. Experience with the following modern data stack highly preferred : GCP, Snowflake, Fivetran, and dbt.
Quality Assurance : Implement best practices for data quality, validation, and testing to ensure the accuracy and integrity of data throughout the pipeline.
Documentation : Create and maintain comprehensive documentation for data pipelines, ensuring knowledge transfer and supportability.
What you need to succeed at GXO :
At a minimum, you’ll need :
Bachelor’s degree in Computer Science, Data Science,or equivalent related work
5+ in designing and building data pipelines on cloud platforms, preferably GCP and Snowflake.
Expertise in working with modern data warehousing solutions such as Snowflake, ie, Snowpipe Streaming, Warehouse Optimization, and clustering. Highly preferred, real-time data processing experience using Kafka.
Python and Advanced SQL : Must be adept at scripting in Python, particularly for data manipulation and integration tasks, and have a solid grasp of advanced SQL techniques for querying, transformation, and performance optimization.
Data Modeling : Understanding of best practices for data modeling, including star schemas, snowflake schemas, and data normalization techniques. A good understanding of the data structures that support email targeting.
ETL / ELT Processes : Experience in designing, building, and optimizing ETL / ELT pipelines to process large datasets using dbt.
Apache Airflow : Experience in building, deploying, and optimizing DAGs in Airflow or a similar tool.
GitHub : Experience with version control, branching, and collaboration on GitHub.
Data Visualization : Knowledge of tools like Superset, Looker or Python visualization libraries (Matplotlib, Seaborn, Plotly…etc)
Collaboration and Communication : Ability to work closely with data scientists, analysts, and other stakeholders to translate business requirements into technical solutions. Strong documentation skills for pipeline design and data flow diagrams.
Data Privacy : Knowledge of data privacy laws including an understanding of GDPR, CCPA, and other regulations to ensure compliance in data privacy practices.