Trust & Safety

Five structural answers. Not policy promises.

The engine is fast. The question every serious buyer asks next is: how do you make this safe for actuarial work, defensible for trustees, and compliant with GDPR?

Each of the five answers below is a structural design choice, not a process commitment. They are properties of how the engine is built, not promises about how careful the team is. That distinction matters — structural properties hold under pressure; policy promises depend on the person who wrote them still being in the room.

01

The actuary owns the deliverable.

The human in the loop directs the engagement throughout. They frame the brief, make every actuarial judgement call, decide which interpretive readings to adopt, and sign off the answer.

The engine and the LLMs are tools that accelerate execution and review. The actuary owns the calculation and the conclusions drawn from it. Accountability never moves.

This is the difference between "an AI did this calculation" — which actuaries cannot sign off — and "AI-assisted developers built software that did this calculation" — which works the same way as every other piece of audited software.

02

No LLM in the runtime path.

Diorama is built with LLM assistance during development. The runtime is deterministic Python and Excel formulas. The same code produces the same outputs every time.

The LLM helps build the calculator. It does not enter the audit trail of any individual calculation. Once the engine is built and approved, LLMs play no further role in the money path.

deterministic Python same inputs = same outputs no hidden interpretation at runtime no source-document reading at runtime
03

Every output is testable.

The calculator's formulas are mechanical translations of the specification's rule text. A regression harness re-implements each rule in Python independently and asserts the workbook's output matches. The harness runs on every rebuild.

Any drift between rule text, workbook formula, and harness reference implementation surfaces as a test failure before anything ships. The harness is the quality gate — a structural answer to "how do we know this is right?"

400+ regression assertions cover the full benefit cascade: GMP pre- and post-1988, excess revaluation, early and late retirement factors, commutation, Finance Act 2024 lump sum regime, spouse and dependant pensions, section/category logic.

04

Two independent model families review each other.

A development LLM does the execution. A review LLM from a different model family, by a different vendor, audits the output — operating from the same source artefacts with no shared context.

The two work in tandem: the reviewer surfaces issues the execution model missed, the execution model addresses them, and successive rounds tighten the work until both converge. The execution model only ever sees the reviewer's findings as written recommendations to act on.

AI-on-AI peer review is structurally analogous to human peer review. It catches the things one model alone tends to miss — including coherence errors, missing edge cases, and assumption drift across a long engagement.

05

Anonymisation before any LLM access.

For engagements where source artefacts contain personal data, anonymisation moves into the pipeline itself. The LLM-driven classification step sees only structure — sheet names, column headers, named ranges, inferred types — never cell values.

1
Structure-only extraction

LLM sees only sheet names, column headers, named ranges, inferred types. Never cell values.

2
Header classification

Columns classified by type. PII-bearing columns identified before any data is read.

3
Human-in-the-loop confirmation

Classification reviewed and confirmed before substitution proceeds.

4
Deterministic substitution

PII values replaced with synthetic equivalents. Substitution map held locally and never transmitted.

5
Validation against known PII patterns

Regex-based check confirms no residual PII before LLM access is granted.

6
LLM access to anonymised artefact only

The audit trail is available to the client's data protection officer on request.

The architectural lever is separation of "knowing what to do" from "doing it." The classifier sees only metadata and cannot physically access member data. The substitution map is held locally and never transmitted.

Questions we'd like you to ask.

Bring the edge cases. Bring the hard governance questions. Bring the bespoke transitional clause that has defeated three vendors. We would like you to try.