MCP-Ready · Executive Brief
01/11
Blotout

Executive brief · the data layer

One data layer.
Process once. Serve many.

Make the platform MCP-ready — so an AI assistant, every dashboard, and every future app read one fresh, governed, pre-aggregated model instead of each hammering the raw stores on every query.

Today → the gap → the proposal → how it's stored → how the AI uses it → backward-compatible → extensible → cost → open questions → the ask

Where we are today

Two stores. Two dashboards.
Queried directly, every visit.

the atomEdgeTag workerevery event · all ~2,000 tags
lake.events · R2per-event · full history · DN subscribers→ Data Nexus
AE stateper-user · ~90 days · sampled · all tags→ AudienceSense

Each dashboard re-queries its raw store on every load, with a cache keyed to its fixed chart shapes. It works — for a known, finite set of dashboards.

The gap

An AI assistant asks anything.
The raw stores can't keep up.

  • Caching stops working. Dashboard caches key on fixed query shapes. An AI asks arbitrary ones → almost every query is a cache miss → a fresh raw scan.
  • The raw stores hit their limits. No fast distinct-count or window functions, hard row + rate caps, and AE is sampled (it undercounts) — so answers are slow, capped, or approximate.
  • Cost runs away. Raw querying is billed on queries × bytes scanned. An AI fans out many arbitrary queries per question, across ~2,000 tags — unbounded by anything we control.

The MCP needs answers the raw stores can't give cheaply, freshly, or within their limits. We need a different shape of data underneath it.

Why this approach

You can't cache a question you can't predict.

A dashboard asks the same few questions — so you cache and tune for them. An AI assistant asks anything, so there's nothing fixed to optimise for. The fix isn't a faster way to query the raw data on every question — it's to stop querying raw data per question at all, and pre-build a compact model you read instead.

  • Summarise once, read foreverFold the event stream into a compact model as it arrives. Every question — expected or brand-new — becomes a cheap lookup, never a fresh scan. → predictable cost · ≤1-min fresh · ≈$0 per answer.
  • Sketches, not raw rowsCount millions of distinct people in ~1.5 KB that merge across any date range or slice. → slice the data any way, with tiny memory and zero raw scanning.
  • Per-tag, with a fenced escape hatchEach tenant's model is isolated — their data never leaves their own store — and the rare question the model can't answer takes a guarded path to raw data, never an open firehose.

Build the answer once → every answer is fast, cheap and bounded, and the existing stores keep working untouched.

What I'm proposing

One pre-aggregated layer in between.

the atom · unchangedEdgeTag workerone event → fans out, as it does today
lake.events→ Data Nexus · unchanged
AE state→ AudienceSense · unchanged
the one new tapminiEvent → per-tag CUBEaggregated in-flight · sealed ≤1 minnew · additive · fail-safe
only the cube continues
Router / Governorpicks the lane: cube · governed-raw · refuse
the new surfaceMCP server ⇄ AI assistantthe LLM calls the cube as tools
Adapters → dashboardsData Nexus · AudienceSense · same numbers

The same event also feeds a per-tag cube. One router serves the AI (through the MCP) and the existing dashboards from that single model. The current writers don't change.

How the data is stored

We store the answer, not the events.

CREATE TABLE cells (        -- the cube: 1 row per slice
  grain        TEXT,    -- minute | hour | day  (time tier)
  period_start INTEGER, -- which time bucket
  cuboid_id    TEXT,    -- which dimension-combo
  dim_key      TEXT,    -- the dim values (e.g. paid·mobile·US)
  events       INTEGER, -- exact counter
  revenue      REAL,    -- exact counter
  purchases    INTEGER, -- exact counter
  users_hll    BLOB,    -- distinct users  (sketch, ~1.5 KB)
  sessions_hll BLOB     -- distinct sessions (sketch)
);
CREATE TABLE lifetime (     -- all-time headline totals
  measure TEXT,  -- users_ever | buyers_ever | …
  sketch  BLOB,  -- never-reset distinct count
  counter INTEGER
);
  • grain + period_starttime-tiered (minute→hour→day, then expire) so storage stays bounded, not infinite.
  • cuboid_id + dim_keyname the exact slice → a filtered question picks cells, never scans events.
  • events · revenue · purchasesplain counters → exact sums; de-duplicated by event_id.
  • users_hll · sessions_hlldistinct counts in ~1.5 KB regardless of volume, and mergeable across time + shards.
  • lifetimeall-time totals need their own store — you can't sum windows (you'd double-count people).

Every question becomes "pick the matching cells and merge them" — fresh in ≤1 minute, and effectively $0 to read.

topnhigh-cardinality dims (product / page / utm) → keep top-N + an "other" bucket.
distpercentile distributions (e.g. time-to-purchase) → a mergeable sketch.
user_anchorone thin row per user-ever (first touch, is_new) — seeded once, then incremented. The heaviest table.
manifestthe versioned model definition — dims, measures, cuboids — drives the whole engine + the catalog the MCP reads.

How the AI uses the data

The AI doesn't see raw data.
It calls tools over the cube.

AI assistant"which users added to cart AND bought?" MCP server4 typed tools Routercube · governed-raw · refuse Cubepick cells · merge sketches answer"79 users — exact"
describe_model what's queryable query a metric by a dimension, any window intersect overlap of two segments lifetime all-time totals

The router enforces the rules, not the AI — the AI just picks a tool and explains the answer.

"Governed raw," in plain terms: the cube answers almost everything. For a rare, very specific question it wasn't built for — e.g. "events in the last 12 minutes by region" — the AI may read the raw event log, but only through a locked gate: a hard time-window cap, a column whitelist, totals only (never names or PII), and the AI never writes the query itself. The controlled opposite of today's "point the AI at the whole raw store."
QuestionLane
in-model (e.g. "sessions by channel, last 7d")CUBE — any window, ≤1-min fresh, $0
event-level / small recent windowGOVERNED RAW — capped, redacted, ~cents
out-of-model + big windowREFUSE → offer the cube view, or promote it

Backward compatible by design

Same numbers. No dashboard changes.

Why nothing breaks
  • The new tap is additive & fail-safe — if the cube ever fails, collection and today's dashboards are unaffected.
  • Adapters answer the dashboards' existing queries against the cube — same response shape, no UI change.
  • Same atom, same field mappings → a metric means the same thing everywhere.
Rollout — never big-bang
1Dual-run — cube builds in shadow; dashboards untouched.
2Reconcile — diff cube vs lake/AE per metric; sign off.
3Cut over — per surface, per tag. Re-pointing a read = instant rollback.

Some differences are intentional improvements — the cube counts the full stream, so it fixes AE's sampling undercount. We quantify every delta before any cutover.

Extend & modify

A new question is a manifest row — not a rebuild.

The engine — ingest → aggregate → seal → cube — never changes. New needs are entries in one versioned manifest. Three kinds, by what they cost — click one:

the engine · fixed ingest → aggregate → seal → cube unchanged ✓
manifest
drives →
what the MCP can answer
  • sessions · users
  • events · revenue · purchases
  • funnel steps
  • consent

Concrete: marketing starts sending utm_content next week → +1 manifest dimension + a one-time backfill. Ingest · aggregate · seal untouched; the MCP answers "conversions by utm_content" on the very next call.

Cost implications

Bounded by what we control.
Not by what the AI asks.

The layer — build the cube once bounded
~$500/mo · all ~2,000 tags
Raw query on every AI read (today's path) unbounded
∞ — grows with every AI query

The cube is built once, paid for by event volume we already control (≈ $70/mo for the 14M-events/day tag; ≈ $500/mo across the fleet). The AI then reads it for ≈ $0. Pointing the AI at raw data instead scales with every question it asks.

Modeled on real Cloudflare unit prices; reads served from the pre-built cube (bounded D1 range-reads). Full model in the engineering doc.

What we need to decide — and the risks we're managing

Open questions. Known risks.

Decisions we need
  • Exact vs ~2% approx for headline distinct counts (per metric).
  • Lifetime backfill — how far back do we seed all-time totals?
  • Launch scope — which dimensions/cuboids do we expose first?
  • Privacy floor & currency — min cohort size; multi-currency handling.
Risks → how we de-risk
  • One tag outgrows one engine → shard by cell-key, merge on read (seam built in from day 1).
  • Numbers don't match → dual-run + reconcile per metric before any cutover.
  • New write path fails → fail-safe tap; collection & dashboards never blocked.
  • Sampling/approx surprises → exact mode available for headline metrics.

None are blockers — they're the choices a pilot resolves with real data.

The ask

Greenlight v1.
Pilot the 14M/day tag.

Todayraw stores can't serve an AI The layerone cube · fresh · governed · cheap Backward-compatiblesame numbers · extensible

Process once. Serve many.