HATCH
Data Engineer
MUST BE BASED IN NYC - No Relocation
Cannot Sponsor
About Hatch
At Hatch, we're building AI that doesn't just assist behind the scenes; it converses with customers out in the wild. Backed by Y Combinator and top-tier investors like Bessemer and NextView, we're scaling fast, doubling revenue year over year, and looking for A players to help us cement our place as the category leader in AI for customer engagement.
About the Role We are looking for a skilled data engineer to join our growing data team. You will be responsible for building, optimizing, and maintaining data pipelinesthat support our analytics, reporting, and AI initiatives. The ideal candidate is an experienced software and data engineer, capable with data analytics, working with large-scale data processing systems and understands the concept of multi-tier data architecture and is able to work closely with sysops.
Our data volume and the sophistication of our AI models are growing rapidly. To keep pace we need engineers who can treat data systems as production software, not just analytics plumbing. If you love writing robust code, designing for scale, and operating distributed systems in production, let's talk.
Note : This is not a businessintelligence or analyst role. Candidates whose primary experience is building reports or dashboards will not be successful here.
Key Responsibilities
- Design, build, and own scalable batch and realtime data pipelines (Kinesis / Pub-SubK / Flink / Spark, Airflow / dbt).
- Develop productionquality services in Python or Go, with comprehensive tests, code reviews, CI / CD, and observability.
- Model, partition, and tune datasets in data lakes & warehouses (BigQuery, Aurora PostgreSQL), balancing performance, cost, and governance.
- Collaborate with backend engineers to define data contracts and streaming interfaces between services.
- Drive infrastructureascode (Terraform) and container orchestration (Docker / Kubernetes / EKS) for the data platform.
- Establish and monitor SLOs for data quality, latency, and availability; debug incidents across the stack.
What We're Looking For
5+ years combined software engineering + data engineering experience, including 3+ years building production services in Python or Go.Proven ability to write performant, maintainable code in large codebases-beyond SQL and lowcode ETL tools.Deep understanding of computerscience fundamentals : data structures, algorithms, concurrency, networking, and distributedsystems concepts.Expertise with the following distributed data technologies : Kafka or Kinesis or Pub / Sub, Spark or Flink or Dask, ClickHouse / Trino, Redis, MongoDB, BigQueryHandson experience instrumenting, monitoring, and troubleshooting services in AWS and GCP (CloudWatch, Prometheus / Grafana).Strong SQL skills and practical knowledge of dimensional and eventdriven data modeling.Familiarity with containerization, CI / CD pipelines, and bluegreen or canary deployments (Kuberneties / EKS, Terraform).Comfortable working in multiple cloud platforms (AWS and GCP)Excellent written and verbal communication-you explain complex systems clearly and bring others along.Nice to Have
Experience supporting ML & LLM inference pipelines in production (vector DBs, feature stores, prompt engineering).Exposure to eventdriven microservices, protobuf / Avro schemas, and schemaregistry governance.Prior success in a fastgrowing startup environment where you wore multiple hatsWhat We Offer
Competitive salary and equityRemote (Eastern or Central Time Zone required) OR Hybrid work environment (3 days / week in our NYC office)Medical, dental, and vision benefits401(k) planFlexible PTOOpportunity to build at the ground floor of a high-growth, mission-driven companyNot offering sponsorshipWhy Hatch
Shape the future of AI-driven customer serviceBuild alongside founders and leaders who value speed, ownership, and ambitionSolve hard problems that impact real businesses and customersJoin a team of builders who care about great engineering, fast execution, and each other