Role description
Position / TITLE : Data E ngineer Lead
Location : US(preferably Texas / Florida)
Job Description :
Looking for a data professional who has 8+ years of experience in data engineering and programming to build systems that collect, manage, and convert raw data into usable information for business requirements. As a Data Engineer, you'll play a crucial role in ensuring data reliability, quality, and efficiency within the organization.
Responsibilities :
- Analyze and organize raw data : Work with various data sources, document parsing, extracting relevant information and structuring it for further processing.
- Build data systems and pipelines : Construct robust data pipelines that facilitate data flow from source to Target.
- Interpret trends and patterns : Use your analytical skills to identify data patterns.
- Conduct complex data analysis and report on results : Dive deep into data to extract meaningful information.
- Prepare data for prescriptive and predictive modeling : Ensure data is ready for machine learning and statistical analysis.
- Build algorithms and prototypes : Develop and test data processing algorithms.
- Combine raw information from different sources : Integrate data from various systems.
- Explore ways to enhance data quality and reliability : Continuously improve data processes.
- Identify opportunities for data acquisition : Stay informed about new data sources.
- Develop analytical tools and programs : Create tools to facilitate data analysis.
- Collaborate with data scientists and architects : Work closely with other data professionals to achieve common goals.
- Design, implement, and maintain data catalog to enhance data discovery, accessibility, and governance within the organization
- Implement data access controls, data encryption, and data masking techniques
- Familiarity with data visualization tools and techniques for presenting data
Mandatory Skills :
Previous experience as a data engineer lead or in a similar role with at least 8+ years of relevant work experience.Good knowledge of programming languages (e.g., Python, Java, Spark, etc).Design, develop, and maintain data pipelines.Experience in working with Data Catalog tools (e.g., OpenMetadata, DataHub, etc)Experience in connecting and configuring data sources (databases, data warehouses, BI tools, etc.) to the data catalog using native connectors or APIs.Experience working with REST APIs and services, messaging and event technologies.Experience working with large and complex data sets.Hands-on experience with SQL / No-SQL database (RDS, Redshift, DynamoDB, synapse, big query, mongo, etc.)Batch / stream data processing experienceLLM, NLP, Ontology and working for AI / ML projectsMonitor, troubleshoot, and optimize the performance of data infrastructure to ensure scalability, reliability, and cost efficiency.Stay up to date with cloud services and best practices in data engineering to continuously improve our data ecosystem.Good exposure on at least two public cloud platforms (Azure / AWS / GCP)Experience with Graph database (e.g. Neptune, RDF4j, etc.)Experience with Vector database (e.g. Pinecone, FAISS, etc.)Good-to-Have Skills :
knowledge or work experience in mortgage and banking domains.Proficiency in building stream processing systems using kinesis, Kafka, etc.Familiarity with Docker, Kubernetes, CI / CD and cloud services (AWS, Azure, GCP).Technical expertise with segmentation techniques.