Talent.com
Software Engineer - Data Acquisition / Web Crawling
Software Engineer - Data Acquisition / Web CrawlingXai • Palo Alto, CA, United States
Software Engineer - Data Acquisition / Web Crawling

Software Engineer - Data Acquisition / Web Crawling

Xai • Palo Alto, CA, United States
30+ days ago
Job type
  • Full-time
Job description

About xAI

xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

Join a cutting-edge Data Acquisition team at xAI, where you'll power the future of AI by building world-class systems to collect and process hundreds of petabytes of data across diverse modalities - web, code, images, audio, video, and beyond. As a Software Engineer specializing in Data Acquisition and Web Crawling, you'll architect and operate large-scale distributed systems that fuel groundbreaking models like Grok 3 and its successors, delivering the high-quality data that drives xAI's mission to understand the universe.

You'll work closely with pre-training, reasoning, multimodal, and other teams to meet their unique data needs, collaborating with engineers to define precise requirements and deploy large-scale classifiers for filtering and categorizing vast datasets. This is your opportunity to tackle complex, petabyte-scale challenges, pushing the boundaries of data engineering to create the foundation for the world's most advanced AI systems.

This role is for hands-on engineers who thrive on solving tough problems, working in a flat, fast-paced environment where initiative and excellence shape leadership. If you're passionate about building robust, high-throughput data pipelines and want to directly impact the evolution of transformative AI, this is your chance to shine.

What You'll Do

  • Building petabyte-scale, high-throughput data processing systems managing hundreds of petabytes to exabytes of data.
  • Designing and operating large-scale distributed systems and pipelines processing hundreds of thousands to millions of operations per second.
  • Managing workloads across large cloud compute clusters.
  • Pre-processing datasets for AI training.
  • Building and operating large-scale crawlers, gathering and communicating requirements clearly and concisely.

Who You Are

  • Strong engineering skills with a passion for improving different aspects of data and model performance.
  • Strong proficiency in at least one compiled language : Rust, Go, C++, or Java.
  • Has worked on one or more modalities other than text and demonstrated exceptional work.
  • Building bespoke data processing libraries from scratch.
  • Designing and implementing distributed systems in Rust.
  • Keeping up with state-of-the-art techniques for preparing AI training data.
  • Experience with performance optimization of large-scale systems is preferred.
  • Organizing and meticulously bookkeeping data across multiple clouds, of multiple modalities, and from many sources.
  • Experience with SQL / NoSQL databases, especially columnar databases, is a plus.
  • Great debugging skills are a must.
  • Must have deep knowledge of how the internet works, including DNS, OSI model, crawler architectures, challenges operating crawlers, and headless browsers.
  • Tech Stack

  • Python
  • Rust
  • Spark
  • Kubernetes
  • Interview Process

    After submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to a 15 minute interview ("phone interview") during which a member of our team will ask some basic questions. If you clear the initial phone interview, you will enter the main process, which consists of four technical interviews :

  • Coding assessment in a language of your choice.
  • Systems hands-on : Demonstrate practical skills in a live problem-solving session.
  • Project deep-dive : Present your past exceptional work to a small audience.
  • Meet and greet with the wider team.
  • Our goal is to finish the main process within one week. All interviews will be conducted via Google Meet.

    Location

    The role is based in the Bay Area [San Francisco and Palo Alto]. Candidates are expected to be located near the Bay Area or open to relocation.

    Annual Salary Range

    $180,000 - $440,000 USD

    Benefits

    Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

    xAI is an equal opportunity employer.

    California Consumer Privacy Act (CCPA) Notice

    Create a job alert for this search

    Software Engineer Data • Palo Alto, CA, United States

    Related jobs
    Senior Software Engineer, Data Acquisition

    Senior Software Engineer, Data Acquisition

    OpenAI • San Francisco, CA, United States
    Full-time
    Senior Software Engineer, Data Acquisition.The Data Acquisition team within the Foundations organization at OpenAI is responsible for all aspects of data collection to support our model training op...Show more
    Last updated: 30+ days ago • Promoted
    Sr. Staff Software Engineer | Data Acquisition

    Sr. Staff Software Engineer | Data Acquisition

    WEX, Inc. • San Francisco, CA, United States
    Full-time
    This is a remote position; however, the candidate must reside within 30 miles of one of the following locations : Portland, ME. Boston, MA; Chicago, IL; San Francisco Bay Area, CA; and Seattle / WA.As...Show more
    Last updated: 10 days ago • Promoted
    Staff Software Engineer, Data Curation

    Staff Software Engineer, Data Curation

    Foxglove • San Francisco, CA, United States
    Full-time
    Robotics will have a massive positive impact on the world economy and global human productivity over the coming decade.At Foxglove, we're excited for this future, and we're building powerful open s...Show more
    Last updated: 5 days ago • Promoted
    AI Incubator - Senior Software Engineer

    AI Incubator - Senior Software Engineer

    Medium • Menlo Park, CA, United States
    Full-time
    At Sprinter Health, our mission is reimagining how people access care by bringing it directly to their homes.Nearly 30% of patients in the U. For many, the ER becomes their first touchpoint with the...Show more
    Last updated: 2 hours ago • Promoted • New!
    Senior Full Stack Software Engineer, Web

    Senior Full Stack Software Engineer, Web

    Vori • San Francisco, CA, United States
    Full-time
    Design and build scalable, high-quality user experiences with React, TypeScript, and modern web technologies.Partner with product, design, and engineering leadership to define and deliver end-to-en...Show more
    Last updated: 29 days ago • Promoted
    Senior Staff Software Engineer - Data Acquisition

    Senior Staff Software Engineer - Data Acquisition

    WEX, Inc. • San Francisco, CA, United States
    Full-time
    This is a remote position; however, the candidate must reside within 30 miles of one of the following locations : Portland, ME. Boston, MA; Chicago, IL; San Francisco Bay Area, CA; and Seattle / WA.As...Show more
    Last updated: 7 days ago • Promoted
    Software Engineer II, Web

    Software Engineer II, Web

    Amplitude • San Francisco, CA, United States
    Full-time
    Amplitude is the leading Amplitude is the leading digital analytics platform, helping over 4,300 customers-including Atlassian, Burger King, NBCUniversal, Square, and Under Armour-build better prod...Show more
    Last updated: 16 days ago • Promoted
    AI Incubator - Staff Software Engineer

    AI Incubator - Staff Software Engineer

    Medium • San Francisco, CA, United States
    Full-time
    At Sprinter Health, our mission is reimagining how people access care by bringing it directly to their homes.Nearly 30% of patients in the U. For many, the ER becomes their first touchpoint with the...Show more
    Last updated: 9 days ago • Promoted
    Staff Software Engineer, Search & Discovery (Hybrid)

    Staff Software Engineer, Search & Discovery (Hybrid)

    Quizlet • San Francisco, CA, United States
    Full-time
    A leading educational technology company in San Francisco is seeking a Staff Software Engineer to enhance their AI-powered learning tools. The ideal candidate will have over 8 years of experience in...Show more
    Last updated: 4 days ago • Promoted
    Staff Software Engineer, Full-Stack - Enterprise Gen AI

    Staff Software Engineer, Full-Stack - Enterprise Gen AI

    Scale AI, Inc. • San Francisco, CA, United States
    Full-time
    Staff Software Engineer, Full-Stack - Enterprise Gen AI.Scale GP (Scale Generative AI Platform) is an enterprise-grade AI platform providing APIs for knowledge retrieval, inference, evaluation, and...Show more
    Last updated: 11 days ago • Promoted
    Staff Software Development Engineer

    Staff Software Development Engineer

    Fortinet • Sunnyvale, CA, United States
    Full-time
    We are seeking a highly experienced and technically proficient Staff Software Development Engineer to join our team.The candidate will possess a very strong background in networking technologies an...Show more
    Last updated: 30+ days ago • Promoted
    Software Engineer, Search and Discovery Platform

    Software Engineer, Search and Discovery Platform

    Whatnot • San Francisco, CA, United States
    Full-time
    Join the Future of Commerce with Whatnot!.Whatnot is the largest live shopping platform in North America and Europe to buy, sell, and discover the things you love. We're re-defining e-commerce by bl...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software Engineer - Full Stack Web

    Senior Software Engineer - Full Stack Web

    Zoox • Foster City, CA, United States
    Full-time
    The Software Systems Infrastructure team is responsible for aiding the Software Systems organization with all its internal tooling needs, development processes and ensuring that all safety-critical...Show more
    Last updated: 30+ days ago • Promoted
    Staff Software Engineer - Search & Discovery

    Staff Software Engineer - Search & Discovery

    Quizlet • San Francisco, CA, United States
    Full-time
    At Quizlet, our mission is to help every learner achieve their outcomes in the most effective and delightful way.Our $1B+ learning platform serves tens of millions of students every month, includin...Show more
    Last updated: 4 days ago • Promoted
    Staff Software Engineer, AI Data Platform

    Staff Software Engineer, AI Data Platform

    Granica • San Francisco, CA, United States
    Full-time
    Granica is redefining how enterprises prepare and optimize data at the most fundamental layer of the AI stack—where raw information becomes usable intelligence. Our technology operates deep in the d...Show more
    Last updated: 7 days ago • Promoted
    Software Engineer, Data Acquisition

    Software Engineer, Data Acquisition

    OpenAI • San Francisco, CA, United States
    Full-time
    Software Engineer, Data Acquisition | OpenAI.The Data Acquisition team within the Foundations organization at OpenAI is responsible for all aspects of data collection to support our model training ...Show more
    Last updated: 30+ days ago • Promoted
    Senior Software Engineer - AI Agent Infrastructure (Healthcare)

    Senior Software Engineer - AI Agent Infrastructure (Healthcare)

    Honey Health • Hayward, CA, United States
    Full-time
    Honey Health is the all-in-one AI back office for primary and specialty care.Our AI agents autonomously handle core back-office jobs, such as aggregating patients data, processing orders and prescr...Show more
    Last updated: 12 days ago • Promoted
    Software Engineer – Scalable Web Apps (SF Office)

    Software Engineer – Scalable Web Apps (SF Office)

    Promote Project • San Francisco, CA, United States
    Full-time
    A tech company based in San Francisco is looking for a Software Engineer to develop scalable web applications.The ideal candidate will design features, collaborate with teams, and troubleshoot prob...Show more
    Last updated: 2 hours ago • Promoted • New!