Job Description
Job Description
Salary :
Data Engineer III (Large Language Models)
About Catalytic Data Science (CDS) :
Catalytic Data Science is a groundbreaking cloud R&D platform designed to integrate volumes of scientific resources, data, and analytic tools while providing the ability to network with colleagues in one secure and scalable environment.By enabling R&D teams to work more collaboratively and improving productivity company-wide, the Catalytic platform helps teams achieve key R&D milestones faster and with greater accuracy.Our customers are passionate about making the world a better place, and we are inspired by the opportunity to help them.
The Role
You are a Data Engineer with experience in processing terabytes of data and working with large language models (LLMs). You have experience in creating and automating scalable, fault-tolerant, and reproducible data pipelines for natural language processing (NLP) using Amazon AWS technologies. You will design and implement data ingestion, processing, and storage solutions that can handle massive amounts of text data from various sources. You are interested in helping to create a platform completely built on top of AWS. You are eager to join a team of Life Scientists and Software Engineers that believe the brightest minds in research should have the best tools to drive innovation.
What Youll Do
- Build, test, and operate automated Extract, Transform, and Load (ETL) pipelines that process terabytes of text data nightly
- Develop service frontends around our various backend data stores (AWS Aurora, MySQL, Elasticsearch, S3)
- Rapidly protype, test, and deploy data pipelines for LLMs using AWS.
- Collaborate with data scientists and NLP engineers to understand the data requirements and specifications for LLMs and related tasks such as text summarization, translation, and question answering.
- Optimize the performance, reliability, and scalability of the data pipelines and LLMs by applying best practices and techniques such as data partitioning, caching, compression, and monitoring.
- Ensure the quality, integrity, and security of the data by implementing data validation, cleaning, and governance policies and procedures.
- Research and evaluate new technologies and methods for data engineering and LLMs and stay updated with the latest trends and developments in the field.
- Participate in data architecture and engineering decisions, bringing your strong experience and knowledge to bear.
Qualifications
Bachelor's degree or higher in computer science, engineering, or a related field.3+ years of experience in data engineering, preferably with large-scale text data and LLMs and 6+ years of any software engineering experience (including data engineering).Proficient in Python 3 or Java, preferably both.Experience with data modeling, ETL, and data warehouse design and implementation.Expertise with ETL schedulers such as Airflow, Prefect or similar frameworks.Familiar with LLMs and NLP concepts and frameworks such as Transformers, BERT, GPT, PaLM, and LLaMA.Day-to-day experience using AWS technologies such as Lambda, ECS Fargate, SQS, & SNSExperience extracting, processing, storing, and querying of petabyte-scale datasetsFamiliarity with building and using containersFamiliarity with event-based microservicesStrong communication, collaboration, and problem-solving skills.Core Skills :
ETL ProcessesData Modeling and Database DesignProficiency in Large Language ModelsData Pipeline OptimizationCross-functional CollaborationProblem-solving and Analytical SkillsNice-to-Haves
Prior experience with Elasticsearch (custom development and / or administration) is a huge plusKnowledge of Graph databasesWhat Do We Love in Team Members?
Your specialization is less important than your ability to learn fast and adapt to shifting technologies. Were especially fond of people who :
Focus on customers needs and our companys goals, not just writing codeIterate until customers love what youve builtSelf-start and initiateSelf-organizeStrive to grow personally and professionally, beyond just expanding technical abilitiesLove to experiment with new technology and share knowledge with the teamIn compliance with federal law, all persons hired will be required to verify identity and eligibility to work in the United States and to complete the required employment eligibility verification document form upon hire.
remote work