Executive brief · the data layer
One data layer.
Process once. Serve many.
Make the platform MCP-ready — so an AI assistant, every dashboard, and every future app read one fresh, governed, pre-aggregated model instead of each hammering the raw stores on every query.
Today → the gap → the proposal → how it's stored → how the AI uses it → backward-compatible → extensible → cost → open questions → the ask
Where we are today
Two stores. Two dashboards.
Queried directly, every visit.
Each dashboard re-queries its raw store on every load, with a cache keyed to its fixed chart shapes. It works — for a known, finite set of dashboards.
The gap
An AI assistant asks anything.
The raw stores can't keep up.
- Caching stops working. Dashboard caches key on fixed query shapes. An AI asks arbitrary ones → almost every query is a cache miss → a fresh raw scan.
- The raw stores hit their limits. No fast distinct-count or window functions, hard row + rate caps, and AE is sampled (it undercounts) — so answers are slow, capped, or approximate.
- Cost runs away. Raw querying is billed on queries × bytes scanned. An AI fans out many arbitrary queries per question, across ~2,000 tags — unbounded by anything we control.
The MCP needs answers the raw stores can't give cheaply, freshly, or within their limits. We need a different shape of data underneath it.
Why this approach
You can't cache a question you can't predict.
A dashboard asks the same few questions — so you cache and tune for them. An AI assistant asks anything, so there's nothing fixed to optimise for. The fix isn't a faster way to query the raw data on every question — it's to stop querying raw data per question at all, and pre-build a compact model you read instead.
- Summarise once, read foreverFold the event stream into a compact model as it arrives. Every question — expected or brand-new — becomes a cheap lookup, never a fresh scan. → predictable cost · ≤1-min fresh · ≈$0 per answer.
- Sketches, not raw rowsCount millions of distinct people in ~1.5 KB that merge across any date range or slice. → slice the data any way, with tiny memory and zero raw scanning.
- Per-tag, with a fenced escape hatchEach tenant's model is isolated — their data never leaves their own store — and the rare question the model can't answer takes a guarded path to raw data, never an open firehose.
Build the answer once → every answer is fast, cheap and bounded, and the existing stores keep working untouched.
What I'm proposing
One pre-aggregated layer in between.
The same event also feeds a per-tag cube. One router serves the AI (through the MCP) and the existing dashboards from that single model. The current writers don't change.
How the data is stored
We store the answer, not the events.
CREATE TABLE cells ( -- the cube: 1 row per slice grain TEXT, -- minute | hour | day (time tier) period_start INTEGER, -- which time bucket cuboid_id TEXT, -- which dimension-combo dim_key TEXT, -- the dim values (e.g. paid·mobile·US) events INTEGER, -- exact counter revenue REAL, -- exact counter purchases INTEGER, -- exact counter users_hll BLOB, -- distinct users (sketch, ~1.5 KB) sessions_hll BLOB -- distinct sessions (sketch) );
CREATE TABLE lifetime ( -- all-time headline totals measure TEXT, -- users_ever | buyers_ever | … sketch BLOB, -- never-reset distinct count counter INTEGER );
- grain + period_starttime-tiered (minute→hour→day, then expire) so storage stays bounded, not infinite.
- cuboid_id + dim_keyname the exact slice → a filtered question picks cells, never scans events.
- events · revenue · purchasesplain counters → exact sums; de-duplicated by event_id.
- users_hll · sessions_hlldistinct counts in ~1.5 KB regardless of volume, and mergeable across time + shards.
- lifetimeall-time totals need their own store — you can't sum windows (you'd double-count people).
Every question becomes "pick the matching cells and merge them" — fresh in ≤1 minute, and effectively $0 to read.
How the AI uses the data
The AI doesn't see raw data.
It calls tools over the cube.
The router enforces the rules, not the AI — the AI just picks a tool and explains the answer.
| Question | Lane |
|---|---|
| in-model (e.g. "sessions by channel, last 7d") | CUBE — any window, ≤1-min fresh, $0 |
| event-level / small recent window | GOVERNED RAW — capped, redacted, ~cents |
| out-of-model + big window | REFUSE → offer the cube view, or promote it |
Backward compatible by design
Same numbers. No dashboard changes.
- The new tap is additive & fail-safe — if the cube ever fails, collection and today's dashboards are unaffected.
- Adapters answer the dashboards' existing queries against the cube — same response shape, no UI change.
- Same atom, same field mappings → a metric means the same thing everywhere.
Some differences are intentional improvements — the cube counts the full stream, so it fixes AE's sampling undercount. We quantify every delta before any cutover.
Extend & modify
A new question is a manifest row — not a rebuild.
The engine — ingest → aggregate → seal → cube — never changes. New needs are entries in one versioned manifest. Three kinds, by what they cost — click one:
drives →
- sessions · users
- events · revenue · purchases
- funnel steps
- consent
Concrete: marketing starts sending utm_content next week → +1 manifest dimension + a one-time backfill. Ingest · aggregate · seal untouched; the MCP answers "conversions by utm_content" on the very next call.
Cost implications
Bounded by what we control.
Not by what the AI asks.
The cube is built once, paid for by event volume we already control (≈ $70/mo for the 14M-events/day tag; ≈ $500/mo across the fleet). The AI then reads it for ≈ $0. Pointing the AI at raw data instead scales with every question it asks.
Modeled on real Cloudflare unit prices; reads served from the pre-built cube (bounded D1 range-reads). Full model in the engineering doc.
What we need to decide — and the risks we're managing
Open questions. Known risks.
- Exact vs ~2% approx for headline distinct counts (per metric).
- Lifetime backfill — how far back do we seed all-time totals?
- Launch scope — which dimensions/cuboids do we expose first?
- Privacy floor & currency — min cohort size; multi-currency handling.
- One tag outgrows one engine → shard by cell-key, merge on read (seam built in from day 1).
- Numbers don't match → dual-run + reconcile per metric before any cutover.
- New write path fails → fail-safe tap; collection & dashboards never blocked.
- Sampling/approx surprises → exact mode available for headline metrics.
None are blockers — they're the choices a pilot resolves with real data.
The ask
Greenlight v1.
Pilot the 14M/day tag.
Process once. Serve many.