Talent.com
Data Engineer

Data Engineer

INSPYR SolutionsCupertino, CA, United States
Hace 4 días
Tipo de contrato
  • A tiempo completo
Descripción del trabajo

ABOUT THIS FEATURED OPPORTUNITY

Join our Data Operations Team as a Python Engineer , supporting machine learning and AI teams that depend on high-quality datasets to train their models. You'll work at the intersection of data engineering, automation, and operational excellence , delivering datasets across approximately 200 projects per year . These include use cases such as image generation, animation, and other generative AI applications . Many projects are highly confidential- engineers must be able to assess data quality and relevance even without full visibility into the end use case .

We're looking for someone who can design and manage data pipelines, debug issues efficiently , and operate independently across multiple fast-paced projects. Strong communication and attention to detail are essential -you'll need to respond quickly, handle issues proactively, and deliver accurate work the first time. Mistakes or rework can pose serious risks to project timelines , so precision and accountability are critical. The ideal candidate will be highly responsive, reliable, and thorough in communication , and must be available to work 9am-4pm PST , even if located in a different state.

THE OPPORTUNITY FOR YOU

  • Work on 3-4 projects to start , scaling up to 6-10 during peak season
  • Contribute to data collection, annotation, and generation pipelines using Python and distributed systems (Spark)
  • Collaborate with a tight-knit and highly responsive team , engaging in biweekly check-ins with team leads
  • Gain experience with confidential, multimodal, and LLM-related datasets across a high volume of AI / ML projects
  • Influence how large-scale datasets are prepared for training models across an enterprise AI org

KEY SUCCESS FACTORS

  • 2+ years of experience in data engineering or Python development, with a strong foundation in Computer Science or Data Science
  • Proficiency in distributed systems (e.g., Spark), and solid understanding of multithreading vs. multiprocessing
  • Demonstrated ability to design scalable pipelines , handle diverse data structures, and manage large-scale workflows
  • Comfortable operating under pressure, context-switching across multiple projects, and working with ambiguity
  • NICE TO HAVES

  • Familiarity with Airflow , Spark , or Flask for scalable API / UI development
  • Experience with Docker , containerization, and CI / CD tools (e.g., Jenkins)
  • Exposure to LLMs , multi-modal data , or generative AI workflows
  • Prior involvement in designing tools to automate or scale ML data pipelines
  • Ability to collaborate in a high-volume, high-trust environment -your work will power some of the most impactful ML use cases in the organization
  • 25-14750

    Crear una alerta de empleo para esta búsqueda

    Data Engineer • Cupertino, CA, United States