The company is redefining how enterprises prepare and optimize data at the most fundamental layer of the AI stack—where raw information becomes usable intelligence. Our technology operates deep in the data infrastructure layer, making data efficient, secure, and ready for scale.
We eliminate the hidden inefficiencies in modern data platforms—slashing storage and compute costs, accelerating pipelines, and boosting platform efficiency. The result : 60%+ lower storage costs, up to 60% lower compute spend, 3× faster data processing, and 20% overall efficiency gains.
Why It Matters
Massive data should fuel innovation, not drain budgets. We remove the bottlenecks holding AI and analytics back—making data lighter, faster, and smarter so teams can ship breakthroughs, not babysit storage and compute bills.
Who We Are
- World renowned researchers in compression, information theory, and data systems
- Elite engineers from Google, Pure Storage, Cohesity and top cloud teams
- Enterprise sellers who turn ROI into seven‑figure wins.
Powered by World-Class Investors & Customers
$65M+ raised from NEA, Bain Capital, A
Capital, and operators behind Okta, Eventbrite, Tesla, and Databricks. Our platform already processes hundreds of petabytes for industry leadersOur Mission :
We’re building the default data substrate for AI, and a generational company built to endure.
Smarter Infrastructure for the AI Era :
We make data efficient, safe, and ready for scale—think smarter, more foundational infrastructure for the AI era. Our technology integrates directly with modern data stacks like Snowflake, Databricks, and S3-based data lakes, enabling :
60%+ reduction in storage costs and up to 60% lower compute spend3x faster data processing20% platform efficiency gainsTrusted by Industry Leaders
Enterprise leaders globally already rely on the company to cut costs, boost performance, and unlock more value from their existing data platforms.
A Deep Tech Approach to AI
We’re unlocking the layers beneath platforms like Snowflake and Databricks, making them faster, cheaper, and more AI-native. We combine advanced research with practical productization, powered by a dual-track strategy :
Research : Led by Chief Scientist Andrea Montanari (Stanford Professor), we publish 1–2 top-tier papers per quarter.Product : Actively processing 100+ PBs today and targeting Exabyte scale by Q4 2025.Backed by the Best
We’ve raised $60M+ from NEA, Bain Capital, A Capital, and operators behind Okta, Eventbrite, Tesla, and Databricks.
Our Mission
To convert entropy into intelligence, so every builder—human or AI—can make the impossible real.
We’re building the default data substrate for AI, and a generational company built to endure beyond any single product cycle.
WHAT YOU’LL DO
This is a deep systems role for someone who lives and breathes distributed infrastructure, understands how data moves at scale, and wants to build the next‑generation AI data platform from the ground up.
Own the ACID backbone. Design and harden transactional layers and metadata services so that petabyte‑scale tables can time‑travel in microseconds and schema evolution becomes a non-event.Turn metadata into rocket fuel. Build compaction, caching, and pruning services that keep millions of file pointers within 50 ms from lookup to plan.Squeeze more signal per byte. Optimize data layouts—from column ordering to dictionary and bit‑packing, bloom filters, and zone‑map indexes—to cut scan I / O by 10× on real‑world workloads.Ship adaptive indexing with research. Co‑invent machine‑driven indexes that learn access patterns and automatically re‑partition nightly—no more manual “analyze table” ever again.Scale the engine, not the babysitting. Write Spark, Flink, or batch pipelines that autoscale across S3, GCS, and ADLS; expose observability hooks; and survive chaos drills without triggering a pager storm.Code for longevity. Write clean, test‑soaked Java, Scala, Go, or C++. Document key invariants so future teams extend the system—instead of rewriting it.Measure success in human latency. If analysts see their dashboards refresh in blink‑level time, you’ve won. Publish your breakthrough and mentor the next engineer to raise the bar again.WHAT WE’RE LOOKING FOR
You’ve built systems where performance, resilience, and clarity of design all matter. You thrive at the intersection of infrastructure engineering and applied research, and care deeply about both how something works and how well it works at scale.
Core Skills
Distributed Systems and Storage Fundamentals — consistency, replication, sharding, durability, transactions.Columnar Storage Optimization — deep knowledge of Parquet or similar formats (column ordering, compression, zone maps).Metadata and Indexing Systems — experience building metadata‑driven services, compaction, caching, and adaptive indexing.Distributed Compute at Scale — production‑grade Spark / Flink or equivalent pipeline development across S3, GCS, or ADLS.Programming for Scale and Longevity — strong coding in Java, Scala, Go, or C++, with clean testing and documentation practices.Resilient Systems and Observability — you’ve built systems that survive chaos drills and expose the right metrics.Desired Skills
Exposure to open table formats such as Apache Iceberg, Delta Lake, or Hudi.Experience with catalog services, query planning, or compaction frameworks .OSS contributions or published work in data infrastructure or distributed systems.WHY JOIN US
If you’ve helped build the modern data stack at a large company—Databricks, Snowflake, Confluent, or similar—you already know how critical lakehouse infrastructure is to AI and analytics at scale. At the company, you’ll take that knowledge and apply it where it matters most…at the most fundamental layer in the data ecosystem.
Own the product, not just the feature. At the company, you won’t be optimizing edge cases or maintaining legacy systems. You’ll architect and build foundational components that define how enterprises manage and optimize data for AI.Move faster, go deeper. No multi‑month review cycles or layers of abstraction—just high‑agency engineering work where great ideas ship weekly. You’ll work directly with the founding team, engage closely with design partners, and see your impact hit production fast.Work on hard, meaningful problems. From transaction layer design in Delta and Iceberg, to petabyte‑scale compaction and schema evolution, to adaptive indexing and cost‑aware query planning—this is deep systems engineering at scale.Join a team of expert builders. Our engineers have designed the core internals of cloud‑scale data systems, and we maintain a culture of peer‑driven learning, hands‑on prototyping, and technical storytelling.Core Differentiation : We’refocused on unlocking a deeper layer of AI infrastructure. By optimizing the way data is stored, processed, and retrieved, we make platforms like Snowflake and Databricks faster, more cost‑efficient, and more AI‑native. Our work sits at the most fundamental layer of the AI stack : where raw data becomes usable intelligence.Be part of something early—without the chaos. The company has already secured $65M+ from NEA, Bain Capital Ventures, ACapital, and legendary operators from Okta, Tesla, and Databricks.Grow with the company. You’ll have the chance to grow into a technical leadership role, mentor future hires, and shape both the engineering culture and product direction as we scale.COMPENSATION & BENEFITS
Competitive salary and meaningful equityUnlimited PTO + quarterly recharge daysPremium health, vision, and dentalTeam offsites, deep tech talks, and learning stipendsHelp build the foundational infrastructure for the AI eraTe company is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
#J-18808-Ljbffr