ABOUT THIS FEATURED OPPORTUNITY
Join our Data Operations Team as a Python Engineer , supporting machine learning and AI teams that depend on high-quality datasets to train their models. You'll work at the intersection of data engineering, automation, and operational excellence , delivering datasets across approximately 200 projects per year . These include use cases such as image generation, animation, and other generative AI applications . Many projects are highly confidential- engineers must be able to assess data quality and relevance even without full visibility into the end use case .
We're looking for someone who can design and manage data pipelines, debug issues efficiently , and operate independently across multiple fast-paced projects. Strong communication and attention to detail are essential -you'll need to respond quickly, handle issues proactively, and deliver accurate work the first time. Mistakes or rework can pose serious risks to project timelines , so precision and accountability are critical. The ideal candidate will be highly responsive, reliable, and thorough in communication , and must be available to work 9am-4pm PST , even if located in a different state.
THE OPPORTUNITY FOR YOU
- Work on 3-4 projects to start , scaling up to 6-10 during peak season
- Contribute to data collection, annotation, and generation pipelines using Python and distributed systems (Spark)
- Collaborate with a tight-knit and highly responsive team , engaging in biweekly check-ins with team leads
- Gain experience with confidential, multimodal, and LLM-related datasets across a high volume of AI / ML projects
- Influence how large-scale datasets are prepared for training models across an enterprise AI org
KEY SUCCESS FACTORS
2+ years of experience in data engineering or Python development, with a strong foundation in Computer Science or Data ScienceProficiency in distributed systems (e.g., Spark), and solid understanding of multithreading vs. multiprocessingDemonstrated ability to design scalable pipelines , handle diverse data structures, and manage large-scale workflowsComfortable operating under pressure, context-switching across multiple projects, and working with ambiguityNICE TO HAVES
Familiarity with Airflow , Spark , or Flask for scalable API / UI developmentExperience with Docker , containerization, and CI / CD tools (e.g., Jenkins)Exposure to LLMs , multi-modal data , or generative AI workflowsPrior involvement in designing tools to automate or scale ML data pipelinesAbility to collaborate in a high-volume, high-trust environment -your work will power some of the most impactful ML use cases in the organization25-14750