Talent.com
Data Engineer
Data EngineerInstitute of Foundation Models • Sunnyvale, CA, United States
Data Engineer

Data Engineer

Institute of Foundation Models • Sunnyvale, CA, United States
1 day ago
Job type
  • Full-time
Job description

About the Institute of Foundation Models

We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

As part of our team, you'll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.

The Role

As a Data Engineer specializing in Natural Language Processing (NLP) and large-scale data processing, you will quickly and effectively gather, curate, and prepare high-quality datasets to support cutting-edge NLP research. Your role will be instrumental in enabling researchers by delivering essential data through efficient and scalable engineering practices, including web crawling, LLM-generated content refinement, and robust data pipelines, primarily leveraging Python and related technologies.

Key Responsibilities

  • Rapidly collect, curate, and preprocess datasets based on detailed specifications provided by NLPresearchers,delivering data within tight timelines.
  • Develop and maintain efficient web crawling solutions, APIs, and automated workflows to continuously improve data collection processes.
  • Refine and evaluate outputs from Large Language Models (LLMs) to generate structured datasets suitable for model training and benchmarking.
  • Implement scalable data pipelines, ensuring efficient data processing, storage, retrieval, and distribution to research teams.
  • Collaborate closely with researchers and engineers to ensure collected data meets specified quality and relevance criteria.
  • Document data collection methodologies, dataset characteristics, and pipeline architecture clearly and effectively.
  • Engage with peer teams and participate in technical reviews to uphold best practices and data quality standards.
  • Represent MBZUAI at industry and research forums, showcasing technical capabilities in large-scale data processing and AI data infrastructure.

Academic Qualifications

  • Bachelor's degree in Computer Science, Data Science, Engineering, or a related technical field required
  • Master's degree or PhD degree or equivalent experience in Computer Science, Data Engineering, or related technical fields preferred.
  • Professional Experience - Required

  • Extensive experience in data engineering, data processing, and automation using Python.
  • Demonstrated proficiency in designing and deploying web crawling solutions, automated data extraction, and processing pipelines.
  • Strong understanding of data structures, algorithms, databases, SQL, and performance optimization.
  • Experience working with cloud infrastructure and distributed data processing frameworks (e.g., AWS, Spark, Kafka, Kubernetes).
  • Excellent problem-solving abilities, attention to detail, and the capability to rapidly address technical challenges.
  • Strong communication and collaboration skills with cross-functional teams.
  • Professional Experience - Preferred

  • Proven track record of supporting NLP or AI research teams with rapid and reliable data delivery.
  • Experience working with large language models, including evaluation, efficient inference, and prompt engineering.
  • Experience with refining outputs from large-scale AI models, such as LLM-generated data.
  • Contributions to open-source projects, coding competitions, or high visibility in coding communities (e.g., GitHub, Stack Overflow).
  • Familiarity with the latest advancements in NLP data processing and large language model technologies.
  • $150,000 - $450,000 a year

    Visa Sponsorship

    This position is eligible for visa sponsorship.

    Benefits Include

  • Comprehensive medical, dental, and vision benefits
  • Bonus
  • 401K Plan
  • Generous paid time off, sick leave and holidays
  • Paid Parental Leave
  • Employee Assistance Program
  • Life insurance and disability
  • Create a job alert for this search

    Data Engineer • Sunnyvale, CA, United States

    Related jobs
    Data Engineer

    Data Engineer

    Kikoff • San Francisco, California, United States
    Full-time
    We are looking for a Data Engineer or Analytics Engineer to join our Data team.You will collaborate with the data scientist and engineers to design, build, and scale high-leverage data models, foun...Show more
    Last updated: 30+ days ago • Promoted
    Senior Data Engineer

    Senior Data Engineer

    Shopmonkey • Morgan Hill, California, United States
    Full-time
    We are seeking a highly skilled and motivated Senior Data Engineer to join our team at Shopmonkey.This role is critical to building, maintaining and improving our data infrastructure, ensuring that...Show more
    Last updated: 30+ days ago • Promoted
    Senior / Staff Data Engineer

    Senior / Staff Data Engineer

    Balbix • San Jose, California, United States
    Full-time
    The Balbix Security Cloud uses AI and automation to reinvent how the World's leading organizations reduce their cyber risk. With Balbix, security teams can accurately inventory their cloud and on-pr...Show more
    Last updated: 30+ days ago • Promoted
    Staff Data Engineer

    Staff Data Engineer

    Credit Karma • Oakland, California, United States
    Full-time
    Intuit Credit Karma is a mission-driven company, focused on championing financial progress for our more than 140 million members globally. While we're best known for pioneering free credit scores, o...Show more
    Last updated: 30+ days ago • Promoted
    Data Engineer

    Data Engineer

    Duckbill • San Francisco, California, United States
    Full-time
    We are developing a SaaS product that simplifies financial planning and analysis of cloud billing data for large enterprises with complex cloud spending requirements. We're looking for a data engine...Show more
    Last updated: 30+ days ago • Promoted
    Data Engineer - Open on W2 only

    Data Engineer - Open on W2 only

    Dataflix • San Jose, CA, United States
    Remote
    Full-time
    We are looking for a Data Engineer to build out and scale our Analytics platform.As a member of the team, you will be responsible for building and scaling a robust platform that will act as the dri...Show more
    Last updated: 30+ days ago • Promoted
    Data Engineer

    Data Engineer

    Delphi • San Francisco, California, United States
    Full-time
    We are redefining how knowledge is shared.We are enabling anyone to transform their unique expertise into an interactive experience, a digital version of their mind : a dynamic reflection of their k...Show more
    Last updated: 30+ days ago • Promoted
    Principal Data Engineer

    Principal Data Engineer

    Tendo • San Francisco, California, United States
    Full-time
    As a Principal Data Engineer, you will work within the Engineering team and contribute to Tendo’s strategic data engineering solutions by ingesting, transforming, and warehousing healthcare-related...Show more
    Last updated: 30+ days ago • Promoted
    Platform Data Engineer

    Platform Data Engineer

    Ohalo • San Francisco, California, United States
    Full-time
    Ohalo is seeking an experienced.This role involves building and maintaining data pipelines to support our machine learning engineering activities and managing data related to plant phenotypes and g...Show more
    Last updated: 30+ days ago • Promoted
    Data Engineer

    Data Engineer

    Brevian • Sunnyvale, California, United States
    Full-time
    We are looking for an experienced Data Engineer to join our Engineering team.The ideal candidate will have a strong background in developing scalable, high-performance data systems, with a particul...Show more
    Last updated: 30+ days ago • Promoted
    Data Engineer

    Data Engineer

    Balbix • San Jose, California, United States
    Full-time
    The Balbix Security Cloud uses AI and automation to reinvent how the World's leading organizations reduce their cyber risk. With Balbix, security teams can accurately inventory their cloud and on-pr...Show more
    Last updated: 30+ days ago • Promoted
    Lead Data Engineer

    Lead Data Engineer

    Midi Health • Palo Alto, California, United States
    Full-time
    We're looking for a Lead Data Engineer to spearhead design, implementation, and iteration of a world-class, modern data infrastructure that will power all of analytics, data science, and ML / AI syst...Show more
    Last updated: 30+ days ago • Promoted
    Data Engineer

    Data Engineer

    Stellar Development Foundation • San Francisco, California, United States
    Full-time
    Interested in working on cutting-edge blockchain technology and creating equitable access to the global financial system? Since 2014, the mission-driven team at the Stellar Development Foundation (...Show more
    Last updated: 30+ days ago • Promoted
    Senior Data Engineer

    Senior Data Engineer

    Tendo • San Francisco, California, United States
    Full-time
    As a Senior Data Engineer, you will work within the Engineering team and contribute to Tendo’s strategic data engineering solutions by ingesting, transforming, and warehousing healthcare-related da...Show more
    Last updated: 30+ days ago • Promoted
    Data Engineer

    Data Engineer

    Rokt • San Francisco, California, United States
    Full-time
    We are Rokt, a hyper-growth ecommerce leader.Rokt is the global leader in ecommerce, unlocking real-time relevance in the moment that matters most. Rokt’s AI Brain and ecommerce Network powers billi...Show more
    Last updated: 4 days ago • Promoted
    Data Engineer

    Data Engineer

    West Monroe • San Francisco, California, United States
    Full-time
    Are you ready to make an impact?.West Monroe is seeking a talented Data Engineer to join our Data Engineering & Analytics team. In this role, you will collaborate with our clients to address their m...Show more
    Last updated: 30+ days ago • Promoted
    Data Engineer

    Data Engineer

    Institute Of Foundation Models • Sunnyvale, California, United States
    Full-time
    About the Institute of Foundation Models.We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next...Show more
    Last updated: 30+ days ago • Promoted
    Data Engineer

    Data Engineer

    Imbue • San Francisco, California, United States
    Full-time
    We’re a small, cross-functional team focused on building AI systems that reason and code.We care deeply about understanding how people interact with these systems and how we can use data to make th...Show more
    Last updated: 30+ days ago • Promoted