Executive brief · the data layer
One data layer.
Process once. Serve many.
Make the platform MCP-ready — so an AI assistant, every dashboard, and every future app read one fresh, governed, pre-aggregated model instead of each hammering the raw stores on every query.
Today → the gap → the proposal → how it's stored → how the AI uses it → backward-compatible → extensible → cost → open questions → the ask
Where we are today
Two stores. Two dashboards.
Queried directly, every visit.
Each dashboard re-queries its raw store on every load, with a cache keyed to its fixed chart shapes. It works — for a known, finite set of dashboards.
The gap
An AI assistant asks anything.
The raw stores can't keep up.
- Caching stops working. Dashboard caches key on fixed query shapes. An AI asks arbitrary ones → almost every query is a cache miss → a fresh raw scan.
- The raw stores hit their limits. No fast distinct-count or window functions, hard row + rate caps, and AE is sampled (it undercounts) — so answers are slow, capped, or approximate.
- Cost runs away. Raw querying is billed on queries × bytes scanned. An AI fans out many arbitrary queries per question, across ~2,000 tags — unbounded by anything we control.
The MCP needs answers the raw stores can't give cheaply, freshly, or within their limits. We need a different shape of data underneath it.
Why this approach
You can't cache a question you can't predict.
A dashboard asks the same few questions — so you cache and tune for them. An AI assistant asks anything, so there's nothing fixed to optimise for. The fix isn't a faster way to query the raw data on every question — it's to stop querying raw data per question at all, and pre-build a compact model you read instead.
- Summarise once, read foreverFold the event stream into a compact model as it arrives. Every question — expected or brand-new — becomes a cheap lookup, never a fresh scan. → predictable cost · ≤1-min fresh · ≈$0 per answer.
- Sketches, not raw rowsCount millions of distinct people in ~1.5 KB that merge across any date range or slice. → slice the data any way, with tiny memory and zero raw scanning.
- Per-tag, with a fenced escape hatchEach tenant's model is isolated — their data never leaves their own store — and the rare question the model can't answer takes a guarded path to raw data, never an open firehose.
Build the answer once → every answer is fast, cheap and bounded, and the existing stores keep working untouched.
What I'm proposing
One pre-aggregated layer in between.
The same event also feeds a per-tag cube. One router serves the AI (through the MCP) and the existing dashboards from that single model. The current writers don't change.
How it's organised — and why it stays small
A few pivot tables,
rolled up over time.
A cuboid is a pre-built pivot table for one set of dimensions. Storing every combination explodes — so we keep the handful people actually slice by:
- core — channel × device × country × consent → "sessions / revenue by any of these, in any combination"
- funnel — by step → "how many reach view → cart → checkout → buy"
- consent — country × consent state → the privacy view
- geo — by region
Fits a cuboid → instant lookup. A new slice = add a cuboid (one manifest row), not a re-build.
Recent data is kept by the minute (fresh, good for forensics). As it ages, a scheduled job rolls it up coarser, then expires it:
Rolling up is a merge, not a re-scan — counters add, sketches union (the whole point of sketches). Fine detail where you need it, coarse where you don't → bounded storage.
Pre-build the pivots people use; keep them by the minute when fresh, roll up to days as they age — that's what fits a busy tag inside its storage budget.
CREATE TABLE cells ( -- the cube: 1 row per slice grain TEXT, -- minute | hour | day (time tier) period_start INTEGER, -- which time bucket cuboid_id TEXT, -- which cuboid (dim-combo) dim_key TEXT, -- the dim values (e.g. paid·mobile·US) events INTEGER, -- exact counter revenue REAL, -- exact counter purchases INTEGER, -- exact counter users_hll BLOB, -- distinct users (sketch, ~1.5 KB) sessions_hll BLOB -- distinct sessions (sketch) );
How the AI uses the data
The AI doesn't see raw data.
It calls tools over the cube.
The router enforces the rules, not the AI — the AI just picks a tool and explains the answer.
| Question | Lane |
|---|---|
| in-model (e.g. "sessions by channel, last 7d") | CUBE — any window, ≤1-min fresh, $0 |
| event-level / small recent window | GOVERNED RAW — capped, redacted, ~cents |
| out-of-model + big window | REFUSE → offer the cube view, or promote it |
Backward compatible by design
Same numbers. No dashboard changes.
- The new tap is additive & fail-safe — if the cube ever fails, collection and today's dashboards are unaffected.
- Adapters answer the dashboards' existing queries against the cube — same response shape, no UI change.
- Same atom, same field mappings → a metric means the same thing everywhere.
Some differences are intentional improvements — the cube counts the full stream, so it fixes AE's sampling undercount. We quantify every delta before any cutover.
| Risk | How we de-risk it |
|---|---|
| One tag outgrows one engine (10 GB) | shard by cell-key, merge on read — the seam is built in from day 1 |
| Numbers don't match today's dashboards | dual-run + reconcile per metric before any cutover |
| The new write path fails | fail-safe tap — collection & dashboards are never blocked |
| ~2% approx surprises on a headline count | exact mode (roaring) available per metric |
Extend & modify
A new question is a manifest row — not a rebuild.
The engine — ingest → aggregate → seal → cube — never changes. New needs are entries in one versioned manifest. Three kinds, by what they cost — click one:
drives →
- sessions · users
- events · revenue · purchases
- funnel steps
- consent
Concrete: marketing starts sending utm_content next week → +1 manifest dimension + a one-time backfill. Ingest · aggregate · seal untouched; the MCP answers "conversions by utm_content" on the very next call.
Cost — there is no flat "cost per tag"
It's ingest + storage.
Storage = the cuboids you switch on.
Ingest scales with the tag's events. Storage = cells × ~3.5 KB (two HLL sketches + counters), and cells = the cuboids you enable × time-buckets. Toggle cuboids → watch the DB size (vs the 10 GB per-engine wall) and the bill move:
Modeled, real unit prices: ~3.5 KB/cell (2 HLLs) · buckets = compacted recent (~200) + retention days · ingest $0.166/M events · DO-SQLite storage $0.20/GB-mo · 10 GB = one engine's limit · non-empty combos only.
The ask
Greenlight a scoped v1 — then a staged rollout.
- Cube + router + MCP on a small, representative set of tags (low / mid / high traffic).
- The cuboids that reproduce today's dashboards (core · funnel · consent) — parity first.
- Dual-run + reconcile vs lake / AE — no cutover.
- One MCP answer a dashboard can't give (cross-segment / ad-hoc).
- Cut over one surface, one tag at a time — re-point = instant rollback.
- Switch on headroom cuboids (region · product · utm) as the MCP needs them.
- Roll across the fleet; shard the heaviest tags.
- Open the adapters / MCP to more surfaces & future apps.
- Which cuboids first? Core reproduces the dashboards — which headroom dims (region / product / utm) matter most for the MCP?
- Exact vs ≈2% for headline distinct counts.
- Lifetime backfill — how far back to seed all-time totals.
- Pilot tag selection · privacy floor (k-anon) · currency.
Process once. Serve many.