Talent.com
Software Engineer - Data Acquisition / Web CrawlingSan Francisco & Palo Alto, CA

Software Engineer - Data Acquisition / Web CrawlingSan Francisco & Palo Alto, CA

XaiSan Francisco, CA, United States
1 day ago
Job type
  • Full-time
Job description

Software Engineer - Data Acquisition / Web Crawling

Join a cutting-edge Data Acquisition team at xAI, where you'll power the future of AI by building world-class systems to collect and process hundreds of petabytes of data across diverse modalities web, code, images, audio, video, and beyond. As a Software Engineer specializing in Data Acquisition and Web Crawling, you'll architect and operate large-scale distributed systems that fuel groundbreaking models like Grok 3 and its successors, delivering the high-quality data that drives xAI's mission to understand the universe.

You'll work closely with pre-training, reasoning, multimodal, and other teams to meet their unique data needs, collaborating with researchers to define precise requirements and deploy large-scale classifiers for filtering and categorizing vast datasets. This is your opportunity to tackle complex, petabyte-scale challenges, pushing the boundaries of data engineering to create the foundation for the world's most advanced AI systems.

This role is for hands-on engineers who thrive on solving tough problems, working in a flat, fast-paced environment where initiative and excellence shape leadership. If you're passionate about building robust, high-throughput data pipelines and want to directly impact the evolution of transformative AI, this is your chance to shine.

What You'll Do

  • Building petabyte-scale, high-throughput data processing systems managing hundreds of petabytes to exabytes of data.
  • Designing and operating large-scale distributed systems and pipelines processing hundreds of thousands to millions of operations per second.
  • Managing workloads across large cloud compute clusters.
  • Pre-processing datasets for AI training.
  • Building and operating large-scale crawlers, gathering and communicating requirements clearly and concisely.

Who You Are

  • Strong engineering skills with a passion for improving different aspects of data and model performance.
  • Strong proficiency in at least one compiled language : Rust, Go, C++, or Java.
  • Has worked on one or more modalities other than text and demonstrated exceptional work.
  • Building bespoke data processing libraries from scratch.
  • Designing and implementing distributed systems in Rust.
  • Keeping up with state-of-the-art techniques for preparing AI training data.
  • Experience with performance optimization of large-scale systems is preferred.
  • Organizing and meticulously bookkeeping data across multiple clouds, of multiple modalities, and from many sources.
  • Experience with SQL / NoSQL databases, especially columnar databases, is a plus.
  • Great debugging skills are a must.
  • Must have deep knowledge of how the internet works, including DNS, OSI model, crawler architectures, challenges operating crawlers, and headless browsers.
  • Tech Stack

  • Python
  • Rust
  • Spark
  • Kubernetes
  • Interview Process

    After submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to a 15 minute interview ("phone interview") during which a member of our team will ask some basic questions. If you clear the initial phone interview, you will enter the main process, which consists of four technical interviews :

  • Coding assessment in a language of your choice.
  • Systems hands-on : Demonstrate practical skills in a live problem-solving session.
  • Project deep-dive : Present your past exceptional work to a small audience.
  • Meet and greet with the wider team.
  • Our goal is to finish the main process within one week. All interviews will be conducted via Google Meet.

    Location

    The role is based in the Bay Area [San Francisco and Palo Alto]. Candidates are expected to be located near the Bay Area or open to relocation.

    Annual Salary Range $180,000 - $440,000 USD

    xAI is an equal opportunity employer and does not unlawfully discriminate based on race, color, religion, ethnicity, ancestry, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, age, disability, medical conditions, genetic information, marital status, military or veteran status, or any other applicable legally protected characteristics. Qualified applicants with arrest or conviction records will be considered for employment in accordance with all applicable federal, state, and local laws.

    Create a job alert for this search

    Software Engineer Data • San Francisco, CA, United States

    Related jobs
    • Promoted
    Senior Software Engineer - Data Acquisition

    Senior Software Engineer - Data Acquisition

    WEXSan Francisco, CA, United States
    Full-time
    This is a remote position; however, the candidate must reside in one of the following locations : San Francisco Bay Area, CA. Portland, ME; Boston, MA; or Chicago, IL.The Data Acquisition Team is th...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Software Engineer, Data Acquisition

    Senior Software Engineer, Data Acquisition

    OpenAISan Francisco, CA, United States
    Full-time
    Senior Software Engineer, Data Acquisition.The Data Acquisition team within the Foundations organization at OpenAI is responsible for all aspects of data collection to support our model training op...Show moreLast updated: 30+ days ago
    • Promoted
    Software Development Engineer

    Software Development Engineer

    FortinetSunnyvale, CA, United States
    Full-time
    Design, develop, and maintain software components for current and future networking-related products with a focus on cloud-native environments. Architect and implement highly scalable software featu...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer 2

    Software Engineer 2

    Typical Set, LLCBerkeley, CA, US
    Full-time
    Typical Set, LLC Position : Software Engineer 2 (SE2510) Responsible for developing systems to translate cutting-edge machine learning into complex trading behaviors. Touch areas as wide-ranging as m...Show moreLast updated: 12 days ago
    • Promoted
    Principal Software Engineer - Full Stack

    Principal Software Engineer - Full Stack

    Informatica LLCRedwood City, CA, United States
    Full-time
    Build Your Career at Informatica.We're looking for a diverse group of collaborators who believe data has the power to improve society. Adventurous, work-from-anywhere minds who value solving some of...Show moreLast updated: 20 days ago
    • Promoted
    Senior Staff Software EngineerSoftware Engineering • Berkeley, CA • Full time • On-site

    Senior Staff Software EngineerSoftware Engineering • Berkeley, CA • Full time • On-site

    Form EnergyBerkeley, CA, United States
    Full-time
    Are you ready to build America's energy future? Form Energy is an American manufacturing and energy technology company.We're revolutionizing energy storage with cost-effective, multi-day technology...Show moreLast updated: 13 days ago
    • Promoted
    Software Engineer, Data Acquisition

    Software Engineer, Data Acquisition

    OpenAISan Francisco, CA, United States
    Full-time
    The Data Acquisition team within the Foundations organization at OpenAI is responsible for all aspects of data collection to support our model training operations. Our team manages web crawling and ...Show moreLast updated: 30+ days ago
    • Promoted
    Staff Software Engineer

    Staff Software Engineer

    SupermicroSan Jose, CA, United States
    Full-time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer

    Software Engineer

    SupermicroSan Jose, CA, United States
    Full-time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show moreLast updated: 30+ days ago
    • Promoted
    Sr. Software Engineer (26456)

    Sr. Software Engineer (26456)

    SupermicroSan Jose, CA, United States
    Full-time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show moreLast updated: 12 days ago
    • Promoted
    Software Engineer 2 - Data Acquisition

    Software Engineer 2 - Data Acquisition

    WEXSan Francisco, CA, United States
    Full-time
    WEX's Data-as-a-Service (DaaS) platform-responsible for ingesting, validating, and orchestrating raw data from dozens of internal systems and third-party providers. Software Engineer 2 - Data Acquis...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer, Search and Discovery Platform

    Software Engineer, Search and Discovery Platform

    WhatnotSan Francisco, CA, United States
    Full-time
    Join the Future of Commerce with Whatnot!.Whatnot is the largest live shopping platform in North America and Europe to buy, sell, and discover the things you love. We're re-defining e-commerce by bl...Show moreLast updated: 30+ days ago
    • Promoted
    Staff Software Engineer

    Staff Software Engineer

    FortinetSunnyvale, CA, United States
    Full-time
    Fortinet is looking for a Staff Software Engineer on the FortiCNAPP Team! Be a valuable member of the team that owns and operates high-availability, cross-cloud, large-volume, data processing syste...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer, Data Infrastructure & Acquisition - San Francisco, USA

    Software Engineer, Data Infrastructure & Acquisition - San Francisco, USA

    SpeechifySan Francisco, CA, United States
    Full-time
    PLEASE APPLY THROUGH THIS LINK : .The mission of Speechify is to make sure that reading is never a barrier to learning.Over 50 million people use Speechify's text-to-speech products to turn whatever ...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer, Observability

    Software Engineer, Observability

    RetoolSan Francisco, CA, United States
    Full-time
    Nearly every company in the world runs on custom software for critical operations like tracking performance metrics, handling customer support workflows, building admin dashboards, and countless ot...Show moreLast updated: 30+ days ago
    • Promoted
    Staff Front End Software Engineer

    Staff Front End Software Engineer

    OSI EngineeringMenlo Park, CA, US
    Full-time
    Staff Front End Software Engineer Job Summary We are looking for a talented Staff Software Engineer to join our front-end engineering team developing web solutions. You will be part of a dynamic tea...Show moreLast updated: 30+ days ago
    • Promoted
    Staff Software Platform EngineerSoftware Engineering • Berkeley, CA; Somerville, MA; Weirton, WV • Full time • On-site

    Staff Software Platform EngineerSoftware Engineering • Berkeley, CA; Somerville, MA; Weirton, WV • Full time • On-site

    Form EnergyBerkeley, CA, United States
    Full-time
    Are you ready to build America's energy future? Form Energy is an American manufacturing and energy technology company.We're revolutionizing energy storage with cost-effective, multi-day technology...Show moreLast updated: 18 days ago
    • Promoted
    Staff Software Engineer

    Staff Software Engineer

    Bio-Rad LaboratoriesHercules, CA, United States
    Full-time
    This role is both technical and collaborative.You will work closely with cross-functional teams including systems engineers, mechanical designers, assay development scientists, and quality engineer...Show moreLast updated: 30+ days ago