bmax

133 Million Chunks

By

We have 133 million vector chunks sitting in MongoDB Atlas. About a terabyte. 90% of that storage is embeddings and text — 512-dimension vectors across 100,000+ government entities, growing by 10 million pages every month.

It costs us $10K/month. That's broken.

We're migrating to Turbopuffer. Projected cost: $900/month at our current query volume. That's not a typo.

Why MongoDB doesn't work for this

MongoDB Atlas vector search is fine if you have a few million documents and don't think too hard about the bill. At 133 million chunks it becomes a problem. We looked at their Online Archive for cost reduction and hit an immediate wall: it's built on Atlas Data Federation, which explicitly doesn't support the $vectorSearch pipeline stage. Dead end.

We tried downgrading search nodes. Went from $8,800/hour to $6,570/hour, but CPU usage from triggers bottlenecked us. The architecture just doesn't want to be cheap at this scale.

Meanwhile, Turbopuffer supports 500 million documents per namespace. Our 133 million chunks aren't even close to their ceiling. They offer 50% batch discounts on writes over 3MB. And because it's built on object storage, the economics are fundamentally different — you're not paying database prices for what is essentially a similarity lookup.

The migration strategy

We're not doing a big-bang cutover. That's how you break production on a Tuesday and spend the rest of the week explaining yourself to customers.

Shali is leading the migration. The approach: dual-write. Keep existing Mongo writes unchanged. Background jobs replicate data to Turbopuffer hourly. Every new document goes to both. We validate Turbopuffer results against Mongo in shadow mode for a few weeks, build confidence, then flip reads over gradually.

The tricky part is embeddings. We're also switching from OpenAI's text-embedding-3-large to Voyage contextual embeddings, which showed 2-3x improvements across recall, MRR, precision, and NDCG in our evals. OpenAI was missing 74% of relevant chunks. Voyage misses 44%. That's the difference between surfacing a procurement signal and missing it entirely.

But you can't just swap embedding models on an existing corpus. The vectors are incompatible. So during the transition, every query generates both an OpenAI and a Voyage embedding, searches the appropriate namespace based on when the document was indexed, then merges and ranks the results. It's ugly. It works. And it goes away once the backfill is done — about $75 for the full run of 800,000 chunks against Voyage.

Cold storage: the other 76.5%

Most of our vector data is old. Government meeting minutes from 2022 aren't generating buying signals in 2026. But we were paying to store and index all of it at the same tier.

Ikram shipped an automated archival pipeline in about five days. Anything older than six months gets moved: embeddings and text stripped out, chunk IDs and metadata preserved, full content pushed to S3. A lightweight Postgres index handles the read fallback if anyone actually needs the old stuff. A weekly Temporal workflow runs the migration, tags documents is_cold: true, done.

76.5% of our data qualified for archival. Combined with the Turbopuffer migration and search node optimization, we're projecting over $100K in annual savings.

  • MongoDB Atlas: $10K/month for 133M chunks
  • Turbopuffer: ~$900/month at 2-3 QPS
  • Cold storage archival: 76.5% of data moved to S3
  • Embedding backfill (Voyage): ~$75 total
  • Mass corpus update: ~$500 for 500GB
  • Projected annual savings: $100K+

What this actually means

Notion ran the same play — moved to Turbopuffer, saved millions, killed per-user AI charges entirely. Linear ditched Elasticsearch and PGVector for the same reason. The pattern is clear: if you're paying database prices for vector search at scale, you're doing it wrong.

Vector search is a commodity. The value isn't in the similarity lookup — it's in what you index, how you chunk it, and what you do with the results. We process 10 million pages of government documents every month. The intelligence is in the pipeline, not the storage layer. Pay accordingly.

Back to home