Research Engineer (Data Infra / ML)
Bay Area (Hybrid)
Can you build & optimize distributed ML pipelines with Ray or Spark?
Do you love speeding up cloud infra (Kubernetes, Docker, CI / CD)?
Excited to build the data backbone for large-scale ML training?
We’re a tier 1 VC-backed start-up, developing hyper-realistic 3D simulations using AI. Our customers include leading names in industries such as autonomous vehicles, drones and robotics.
Role
You’ll be hands-on improving CI / CD pipelines, speeding up Docker builds, and scaling scene processing on Ray. You’ll also :
- Build high-performance data pipelines for multimodal datasets (3D, video, sensor).
- Optimize distributed training and processing across Spark, Databricks, and Kubernetes.
- Work with researchers to productionize PyTorch models and streamline ML workflows.
- Develop tools that make data discoverable, reusable, and reliable throughout the ML lifecycle.
You
Strong Python skills and experience with distributed systems (Ray, Spark, Flyte, Dask).Hands-on with cloud, Kubernetes, and distributed training (Ray, PyTorch DDP, Horovod).Familiar with dataset versioning and experiment tracking (DVC, MLflow).Bonus Points
Experience in simulation, robotics, or autonomy pipelines.Background in deep learning (PyTorch) and 3D / sensor data (LIDAR, meshes, radiance fields).Open-source contributions or frontend / UI experience.#J-18808-Ljbffr