Executive brief · the data layer

One data layer.
Process once. Serve many.

Make the platform MCP-ready — so an AI assistant, every dashboard, and every future app read one fresh, governed, pre-aggregated model instead of each hammering the raw stores on every query.

Today → the gap → the proposal → how it's stored → how the AI uses it → backward-compatible → extensible → cost → open questions → the ask

Where we are today

Two stores. Two dashboards.
Queried directly, every visit.

the atomEdgeTag workerevery event · all ~2,000 tags

↓

lake.events · R2per-event · full history · DN subscribers→ Data Nexus

AE stateper-user · ~90 days · sampled · all tags→ AudienceSense

Each dashboard re-queries its raw store on every load, with a cache keyed to its fixed chart shapes. It works — for a known, finite set of dashboards.

The gap

An AI assistant asks anything.
The raw stores can't keep up.

Caching stops working. Dashboard caches key on fixed query shapes. An AI asks arbitrary ones → almost every query is a cache miss → a fresh raw scan.
The raw stores hit their limits. No fast distinct-count or window functions, hard row + rate caps, and AE is sampled (it undercounts) — so answers are slow, capped, or approximate.
Cost runs away. Raw querying is billed on queries × bytes scanned. An AI fans out many arbitrary queries per question, across ~2,000 tags — unbounded by anything we control.

The MCP needs answers the raw stores can't give cheaply, freshly, or within their limits. We need a different shape of data underneath it.

Why this approach

You can't cache a question you can't predict.

A dashboard asks the same few questions — so you cache and tune for them. An AI assistant asks anything, so there's nothing fixed to optimise for. The fix isn't a faster way to query the raw data on every question — it's to stop querying raw data per question at all, and pre-build a compact model you read instead.

Summarise once, read foreverFold the event stream into a compact model as it arrives. Every question — expected or brand-new — becomes a cheap lookup, never a fresh scan. → predictable cost · ≤1-min fresh · ≈$0 per answer.
Sketches, not raw rowsCount millions of distinct people in ~1.5 KB that merge across any date range or slice. → slice the data any way, with tiny memory and zero raw scanning.
Per-tag, with a fenced escape hatchEach tenant's model is isolated — their data never leaves their own store — and the rare question the model can't answer takes a guarded path to raw data, never an open firehose.

Build the answer once → every answer is fast, cheap and bounded, and the existing stores keep working untouched.

What I'm proposing

One pre-aggregated layer in between.

the atom · unchangedEdgeTag workerone event → fans out, as it does today

↓

lake.events→ Data Nexus · unchanged

AE state→ AudienceSense · unchanged

the one new tapminiEvent → per-tag CUBEaggregated in-flight · sealed ≤1 minnew · additive · fail-safe

↓ only the cube continues

Router / Governorpicks the lane: cube · governed-raw · refuse

↓

the new surfaceMCP server ⇄ AI assistantthe LLM calls the cube as tools

Adapters → dashboardsData Nexus · AudienceSense · same numbers

The same event also feeds a per-tag cube. One router serves the AI (through the MCP) and the existing dashboards from that single model. The current writers don't change.

How it's organised — and why it stays small

A few pivot tables,
rolled up over time.

Cuboids = the pivots people actually use

A cuboid is a pre-built pivot table for one set of dimensions. Storing every combination explodes — so we keep the handful people actually slice by:

core — channel × device × country × consent → "sessions / revenue by any of these, in any combination"
funnel — by step → "how many reach view → cart → checkout → buy"
consent — country × consent state → the privacy view
geo — by region

Fits a cuboid → instant lookup. A new slice = add a cuboid (one manifest row), not a re-build.

Rollups = how it stays bounded

Recent data is kept by the minute (fresh, good for forensics). As it ages, a scheduled job rolls it up coarser, then expires it:

minutetoday→hour~14 days→day180 days→expire

Rolling up is a merge, not a re-scan — counters add, sketches union (the whole point of sketches). Fine detail where you need it, coarse where you don't → bounded storage.

Pre-build the pivots people use; keep them by the minute when fresh, roll up to days as they age — that's what fits a busy tag inside its storage budget.

CREATE TABLE cells (        -- the cube: 1 row per slice
  grain        TEXT,    -- minute | hour | day  (time tier)
  period_start INTEGER, -- which time bucket
  cuboid_id    TEXT,    -- which cuboid (dim-combo)
  dim_key      TEXT,    -- the dim values (e.g. paid·mobile·US)
  events       INTEGER, -- exact counter
  revenue      REAL,    -- exact counter
  purchases    INTEGER, -- exact counter
  users_hll    BLOB,    -- distinct users  (sketch, ~1.5 KB)
  sessions_hll BLOB     -- distinct sessions (sketch)
);

topnhigh-cardinality dims (product / page / utm) → keep top-N + an "other" bucket.

distpercentiles (e.g. order value) → a mergeable sketch.

lifetimenever-reset all-time totals — you can't sum windows.

manifestthe versioned model — dims, measures, cuboids — drives the engine + describe_model.

How the AI uses the data

The AI doesn't see raw data.
It calls tools over the cube.

AI assistant"which users added to cart AND bought?" → MCP server4 typed tools → Routercube · governed-raw · refuse → Cubepick cells · merge sketches ↩ answer"79 users — exact"

describe_model what's queryable query a metric by a dimension, any window intersect overlap of two segments lifetime all-time totals

The router enforces the rules, not the AI — the AI just picks a tool and explains the answer.

"Governed raw," in plain terms: the cube answers almost everything. For a rare, very specific question it wasn't built for — e.g. "events in the last 12 minutes by region" — the AI may read the raw event log, but only through a locked gate: a hard time-window cap, a column whitelist, totals only (never names or PII), and the AI never writes the query itself. The controlled opposite of today's "point the AI at the whole raw store."

Question	Lane
in-model (e.g. "sessions by channel, last 7d")	CUBE — any window, ≤1-min fresh, $0
event-level / small recent window	GOVERNED RAW — capped, redacted, ~cents
out-of-model + big window	REFUSE → offer the cube view, or promote it

Backward compatible by design

Same numbers. No dashboard changes.

Why nothing breaks

The new tap is additive & fail-safe — if the cube ever fails, collection and today's dashboards are unaffected.
Adapters answer the dashboards' existing queries against the cube — same response shape, no UI change.
Same atom, same field mappings → a metric means the same thing everywhere.

Rollout — never big-bang

1Dual-run — cube builds in shadow; dashboards untouched.

2Reconcile — diff cube vs lake/AE per metric; sign off.

3Cut over — per surface, per tag. Re-pointing a read = instant rollback.

Some differences are intentional improvements — the cube counts the full stream, so it fixes AE's sampling undercount. We quantify every delta before any cutover.

Risk	How we de-risk it
One tag outgrows one engine (10 GB)	shard by cell-key, merge on read — the seam is built in from day 1
Numbers don't match today's dashboards	dual-run + reconcile per metric before any cutover
The new write path fails	fail-safe tap — collection & dashboards are never blocked
~2% approx surprises on a headline count	exact mode (roaring) available per metric

Extend & modify

A new question is a manifest row — not a rebuild.

The engine — ingest → aggregate → seal → cube — never changes. New needs are entries in one versioned manifest. Three kinds, by what they cost — click one:

the engine · fixed ingest → aggregate → seal → cube unchanged ✓

manifest
drives →

what the MCP can answer

sessions · users
events · revenue · purchases
funnel steps
consent

Concrete: marketing starts sending utm_content next week → +1 manifest dimension + a one-time backfill. Ingest · aggregate · seal untouched; the MCP answers "conversions by utm_content" on the very next call.

Cost — there is no flat "cost per tag"

It's ingest + storage.
Storage = the cuboids you switch on.

Ingest scales with the tag's events. Storage = cells × ~3.5 KB (two HLL sketches + counters), and cells = the cuboids you enable × time-buckets. Toggle cuboids → watch the DB size (vs the 10 GB per-engine wall) and the bill move:

Events / day 14M Day-grain retention 180d

DB size / tag0 GBof 10 GB — one engine

Modeled, real unit prices: ~3.5 KB/cell (2 HLLs) · buckets = compacted recent (~200) + retention days · ingest $0.166/M events · DO-SQLite storage $0.20/GB-mo · 10 GB = one engine's limit · non-empty combos only.

The ask

Greenlight a scoped v1 — then a staged rollout.

In scope · v1

Cube + router + MCP on a small, representative set of tags (low / mid / high traffic).
The cuboids that reproduce today's dashboards (core · funnel · consent) — parity first.
Dual-run + reconcile vs lake / AE — no cutover.
One MCP answer a dashboard can't give (cross-segment / ad-hoc).

Next · after sign-off

Cut over one surface, one tag at a time — re-point = instant rollback.
Switch on headroom cuboids (region · product · utm) as the MCP needs them.
Roll across the fleet; shard the heaviest tags.
Open the adapters / MCP to more surfaces & future apps.

Decide together · product

Which cuboids first? Core reproduces the dashboards — which headroom dims (region / product / utm) matter most for the MCP?
Exact vs ≈2% for headline distinct counts.
Lifetime backfill — how far back to seed all-time totals.
Pilot tag selection · privacy floor (k-anon) · currency.

Process once. Serve many.

One data layer.Process once. Serve many.

Two stores. Two dashboards.Queried directly, every visit.

An AI assistant asks anything.The raw stores can't keep up.

You can't cache a question you can't predict.

One pre-aggregated layer in between.

A few pivot tables,rolled up over time.

The AI doesn't see raw data.It calls tools over the cube.

Same numbers. No dashboard changes.

A new question is a manifest row — not a rebuild.

It's ingest + storage.Storage = the cuboids you switch on.

Greenlight a scoped v1 — then a staged rollout.

Presenter controls

One data layer.
Process once. Serve many.

Two stores. Two dashboards.
Queried directly, every visit.

An AI assistant asks anything.
The raw stores can't keep up.

A few pivot tables,
rolled up over time.

The AI doesn't see raw data.
It calls tools over the cube.

It's ingest + storage.
Storage = the cuboids you switch on.