We are seeking a highly skilled and forward-thinking Data Engineer to drive the integration of Large Language Models (LLMs) and Generative AI systems into our data ecosystem. This role will focus on designing and operationalizing intelligent data pipelines and interfaces that enable seamless interaction between curated enterprise data and advanced AI models. You will play a key role in bridging data engineering and AI innovation ensuring secure scalable and high-performance systems that power next-generation language-based applications.
Key Responsibilities
Design build and optimize data pipelines that serve as the backbone for LLM-powered systems and AI applications.
Integrate Generative AI and LLM technologies (e.g. OpenAI Anthropic Azure OpenAI or open-source models like LLaMA or Mistral) with curated enterprise data.
Develop and maintain retrieval-augmented generation (RAG) pipelines to connect structured and unstructured data to model contexts.
Collaborate with data scientists ML engineers and AI researchers to ensure alignment between data readiness and model performance.
Implement agentic system architectures including orchestration frameworks (e.g. LangChain Semantic Kernel or similar).
Enforce AI security compliance and data governance best practices to ensure responsible use of enterprise data in AI applications.
Automate LLM evaluation model fine-tuning and deployment workflows where applicable.
Monitor and troubleshoot AI data pipelines ensuring high availability scalability and accuracy of responses.
Document design patterns integration strategies and operational playbooks for AI-driven data engineering.
Required Skills & Qualifications
Proven experience as a Data Engineer or ML Engineer with hands-on expertise in LLM or Generative AI system integrations.
Strong proficiency in Python SQL and distributed data frameworks (e.g. Spark DataBricks).
Practical understanding of RAG architectures vector databases (e.g. Pinecone Weaviate Chroma FAISS) and embedding pipelines.
Familiarity with LangChain LlamaIndex Semantic Kernel or equivalent frameworks.
Experience implementing secure and compliant AI pipelines with understanding of AI security prompt injection defenses and data privacy.
Solid understanding of cloud-based AI infrastructure-preferably Azure AI Services Azure DataBricks and Azure OpenAI Service.
Excellent problem-solving skills and ability to work across data infrastructure and AI teams.
Bachelors degree in Computer Science Engineering or related field (or equivalent experience).
Preferred Qualifications
Experience fine-tuning or customizing LLMs for enterprise use cases.
Familiarity with MLflow MLOps and CI / CD for model deployment.
Knowledge of medallion data architecture and Delta Lake for AI-ready data management.
Experience with streaming data systems (e.g. Kafka Event Hubs) for real-time AI applications.
Contributions to open-source AI frameworks or enterprise AI integrations.
Key Skills
Apache Hive,S3,Hadoop,Redshift,Spark,AWS,Apache Pig,NoSQL,Big Data,Data Warehouse,Kafka,Scala
Employment Type : Full Time
Experience : years
Vacancy : 1
Engineer • Columbus, Nebraska, USA