Data Engineer

INSPYR SolutionsCupertino, CA, United States

Hace 4 días

Tipo de contrato

A tiempo completo

Descripción del trabajo

ABOUT THIS FEATURED OPPORTUNITY

Join our Data Operations Team as a Python Engineer , supporting machine learning and AI teams that depend on high-quality datasets to train their models. You'll work at the intersection of data engineering, automation, and operational excellence , delivering datasets across approximately 200 projects per year . These include use cases such as image generation, animation, and other generative AI applications . Many projects are highly confidential- engineers must be able to assess data quality and relevance even without full visibility into the end use case .

We're looking for someone who can design and manage data pipelines, debug issues efficiently , and operate independently across multiple fast-paced projects. Strong communication and attention to detail are essential -you'll need to respond quickly, handle issues proactively, and deliver accurate work the first time. Mistakes or rework can pose serious risks to project timelines , so precision and accountability are critical. The ideal candidate will be highly responsive, reliable, and thorough in communication , and must be available to work 9am-4pm PST , even if located in a different state.

THE OPPORTUNITY FOR YOU

Work on 3-4 projects to start , scaling up to 6-10 during peak season
Contribute to data collection, annotation, and generation pipelines using Python and distributed systems (Spark)
Collaborate with a tight-knit and highly responsive team , engaging in biweekly check-ins with team leads
Gain experience with confidential, multimodal, and LLM-related datasets across a high volume of AI / ML projects
Influence how large-scale datasets are prepared for training models across an enterprise AI org

KEY SUCCESS FACTORS

2+ years of experience in data engineering or Python development, with a strong foundation in Computer Science or Data Science

Proficiency in distributed systems (e.g., Spark), and solid understanding of multithreading vs. multiprocessing

Demonstrated ability to design scalable pipelines , handle diverse data structures, and manage large-scale workflows

Comfortable operating under pressure, context-switching across multiple projects, and working with ambiguity

NICE TO HAVES

Familiarity with Airflow , Spark , or Flask for scalable API / UI development

Experience with Docker , containerization, and CI / CD tools (e.g., Jenkins)

Exposure to LLMs , multi-modal data , or generative AI workflows

Prior involvement in designing tools to automate or scale ML data pipelines

Ability to collaborate in a high-volume, high-trust environment -your work will power some of the most impactful ML use cases in the organization

25-14750

Crear una alerta de empleo para esta búsqueda

Data Engineer • Cupertino, CA, United States