Talent.com
Software Engineer - Data Acquisition / Web CrawlingSan Francisco & Palo Alto, CA

Software Engineer - Data Acquisition / Web CrawlingSan Francisco & Palo Alto, CA

XaiSan Francisco, CA, United States
23 hours ago
Job type
  • Full-time
Job description

Software Engineer - Data Acquisition / Web Crawling

Join a cutting-edge Data Acquisition team at xAI, where you'll power the future of AI by building world-class systems to collect and process hundreds of petabytes of data across diverse modalities web, code, images, audio, video, and beyond. As a Software Engineer specializing in Data Acquisition and Web Crawling, you'll architect and operate large-scale distributed systems that fuel groundbreaking models like Grok 3 and its successors, delivering the high-quality data that drives xAI's mission to understand the universe.

You'll work closely with pre-training, reasoning, multimodal, and other teams to meet their unique data needs, collaborating with researchers to define precise requirements and deploy large-scale classifiers for filtering and categorizing vast datasets. This is your opportunity to tackle complex, petabyte-scale challenges, pushing the boundaries of data engineering to create the foundation for the world's most advanced AI systems.

This role is for hands-on engineers who thrive on solving tough problems, working in a flat, fast-paced environment where initiative and excellence shape leadership. If you're passionate about building robust, high-throughput data pipelines and want to directly impact the evolution of transformative AI, this is your chance to shine.

What You'll Do

  • Building petabyte-scale, high-throughput data processing systems managing hundreds of petabytes to exabytes of data.
  • Designing and operating large-scale distributed systems and pipelines processing hundreds of thousands to millions of operations per second.
  • Managing workloads across large cloud compute clusters.
  • Pre-processing datasets for AI training.
  • Building and operating large-scale crawlers, gathering and communicating requirements clearly and concisely.

Who You Are

  • Strong engineering skills with a passion for improving different aspects of data and model performance.
  • Strong proficiency in at least one compiled language : Rust, Go, C++, or Java.
  • Has worked on one or more modalities other than text and demonstrated exceptional work.
  • Building bespoke data processing libraries from scratch.
  • Designing and implementing distributed systems in Rust.
  • Keeping up with state-of-the-art techniques for preparing AI training data.
  • Experience with performance optimization of large-scale systems is preferred.
  • Organizing and meticulously bookkeeping data across multiple clouds, of multiple modalities, and from many sources.
  • Experience with SQL / NoSQL databases, especially columnar databases, is a plus.
  • Great debugging skills are a must.
  • Must have deep knowledge of how the internet works, including DNS, OSI model, crawler architectures, challenges operating crawlers, and headless browsers.
  • Tech Stack

  • Python
  • Rust
  • Spark
  • Kubernetes
  • Interview Process

    After submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to a 15 minute interview ("phone interview") during which a member of our team will ask some basic questions. If you clear the initial phone interview, you will enter the main process, which consists of four technical interviews :

  • Coding assessment in a language of your choice.
  • Systems hands-on : Demonstrate practical skills in a live problem-solving session.
  • Project deep-dive : Present your past exceptional work to a small audience.
  • Meet and greet with the wider team.
  • Our goal is to finish the main process within one week. All interviews will be conducted via Google Meet.

    Location

    The role is based in the Bay Area [San Francisco and Palo Alto]. Candidates are expected to be located near the Bay Area or open to relocation.

    Annual Salary Range $180,000 - $440,000 USD

    xAI is an equal opportunity employer and does not unlawfully discriminate based on race, color, religion, ethnicity, ancestry, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, age, disability, medical conditions, genetic information, marital status, military or veteran status, or any other applicable legally protected characteristics. Qualified applicants with arrest or conviction records will be considered for employment in accordance with all applicable federal, state, and local laws.

    Create a job alert for this search

    Software Engineer Data • San Francisco, CA, United States

    Related jobs
    • Promoted
    Senior Software Engineer - Data Acquisition

    Senior Software Engineer - Data Acquisition

    WEXSan Francisco, CA, United States
    Full-time
    This is a remote position; however, the candidate must reside in one of the following locations : San Francisco Bay Area, CA. Portland, ME; Boston, MA; or Chicago, IL.The Data Acquisition Team is th...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Software Engineer, Data Acquisition

    Senior Software Engineer, Data Acquisition

    OpenAISan Francisco, CA, United States
    Full-time
    Senior Software Engineer, Data Acquisition.The Data Acquisition team within the Foundations organization at OpenAI is responsible for all aspects of data collection to support our model training op...Show moreLast updated: 30+ days ago
    • Promoted
    AI Incubator - Senior Software Engineer in San Francisco

    AI Incubator - Senior Software Engineer in San Francisco

    Energy Jobline ZRSan Francisco, CA, United States
    Full-time
    Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub.We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy ...Show moreLast updated: 23 hours ago
    • Promoted
    Software Engineer - Full Stack Web

    Software Engineer - Full Stack Web

    ZooxFoster City, CA, United States
    Full-time
    The Software Systems Infrastructure team supports the broader Software Systems organization by developing internal tools, optimizing development processes, and ensuring that all safety-critical sof...Show moreLast updated: 23 hours ago
    • Promoted
    Software Engineer 2

    Software Engineer 2

    Typical Set, LLCBerkeley, CA, US
    Full-time
    Typical Set, LLC Position : Software Engineer 2 (SE2510) Responsible for developing systems to translate cutting-edge machine learning into complex trading behaviors. Touch areas as wide-ranging as m...Show moreLast updated: 11 days ago
    • Promoted
    Software Engineer, Web Development

    Software Engineer, Web Development

    Delta Dental of CaliforniaSan Francisco, CA, United States
    Full-time
    The Software Engineer will help build the Next Gen Applications & Products Suite.Along with writing effective code, the position will be responsible for designing, building, configuring, delivering...Show moreLast updated: 23 hours ago
    • Promoted
    Staff Software Engineer

    Staff Software Engineer

    SupermicroSan Jose, CA, United States
    Full-time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer

    Software Engineer

    SupermicroSan Jose, CA, United States
    Full-time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer 2 - Data Acquisition

    Software Engineer 2 - Data Acquisition

    WEXSan Francisco, CA, United States
    Full-time
    WEX's Data-as-a-Service (DaaS) platform-responsible for ingesting, validating, and orchestrating raw data from dozens of internal systems and third-party providers. Software Engineer 2 - Data Acquis...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer - Agentic AI (San Francisco)

    Software Engineer - Agentic AI (San Francisco)

    GreylockSan Francisco, CA, United States
    Full-time
    Software Engineer - GenAI (San Francisco).Direct message the job poster from Greylock.Newly funded Series Seed startup is expanding their founding engineering team. The company is still in stealth m...Show moreLast updated: 23 hours ago
    • Promoted
    Software Engineer - San Francisco (Onsite)

    Software Engineer - San Francisco (Onsite)

    Vivo HealthStaffSan Francisco, CA, United States
    Full-time
    Software Engineer (Full-Stack, Back-End Leaning).We are a rapidly growing technology startup in the HealthTech sector, serving as a critical administrative partner for new healthcare insurance comp...Show moreLast updated: 23 hours ago
    • Promoted
    Staff Software Engineer

    Staff Software Engineer

    FortinetSunnyvale, CA, United States
    Full-time
    Fortinet is looking for a Staff Software Engineer on the FortiCNAPP Team! Be a valuable member of the team that owns and operates high-availability, cross-cloud, large-volume, data processing syste...Show moreLast updated: 30+ days ago
    • Promoted
    Software Engineer, Data Acquisition

    Software Engineer, Data Acquisition

    OpenAISan Francisco, CA, United States
    Full-time
    Software Engineer, Data Acquisition | OpenAI.The Data Acquisition team within the Foundations organization at OpenAI is responsible for all aspects of data collection to support our model training ...Show moreLast updated: 25 days ago
    • Promoted
    Software Engineer, Observability San Francisco, CA

    Software Engineer, Observability San Francisco, CA

    RetoolSan Francisco, CA, United States
    Full-time
    Nearly every company in the world runs on custom software : Gartner estimates that up to 50% of all code is written for internal use. This is the operational software for refunding orders, underwriti...Show moreLast updated: 23 hours ago
    • Promoted
    AI Incubator - Staff / Senior Software Engineer

    AI Incubator - Staff / Senior Software Engineer

    Sprinter HealthSan Francisco, CA, United States
    Full-time
    At Sprinter Health, our mission is reimagining how people access care by bringing it directly to their homes.Nearly 30% of patients in the U. For many, the ER becomes their first touchpoint with the...Show moreLast updated: 23 hours ago
    • Promoted
    Staff Front End Software Engineer

    Staff Front End Software Engineer

    OSI EngineeringMenlo Park, CA, US
    Full-time
    Staff Front End Software Engineer Job Summary We are looking for a talented Staff Software Engineer to join our front-end engineering team developing web solutions. You will be part of a dynamic tea...Show moreLast updated: 30+ days ago
    • Promoted
    Staff Software Engineer

    Staff Software Engineer

    Bio-Rad LaboratoriesHercules, CA, United States
    Full-time
    This role is both technical and collaborative.You will work closely with cross-functional teams including systems engineers, mechanical designers, assay development scientists, and quality engineer...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Software Engineer - Data Acquisition

    Senior Software Engineer - Data Acquisition

    WEX, Inc.San Francisco, California, United States
    Full-time
    About the Role Data Acquisition Team.WEX's Data-as-a-Service (DaaS) platform – responsible for ingesting, validating, and orchestrating raw data from dozens of internal systems and third‑party prov...Show moreLast updated: 1 day ago