Dynes Insights · Evaluation Framework

AI Output Quality Rubric

A structured, seven-dimension framework for evaluating and guiding AI-generated outputs in high-stakes professional contexts.

Version 1.1 · Field Testing

Purpose

This rubric provides a structured, repeatable evaluation framework for AI-generated outputs. It is designed to assess outputs against seven defined quality dimensions, operating in two modes — Guide Mode before or during creation, and Checker Mode after production.

The framework rewards depth of thinking and genuine critical engagement over speed and surface polish. It generates honest, developmental feedback — not approval-seeking affirmation — and produces improvement data that accumulates into a learning system over time.

The rubric is domain-adaptable, but is designed primarily for high-stakes outputs: bid writing, strategic briefings, and coaching outputs.


Operating Modes

Guide Mode

Applied before or during the creation of a high-stakes output. The seven dimensions act as active quality standards during creation, with a brief self-evaluation at completion.

Checker Mode

Applied after an output has been produced. Always begins with five clarifying questions before evaluation proceeds. Evaluation is calibrated to the answers given.

Checker Mode — Clarifying Questions

Before any evaluation begins, the following five questions are asked together in a single exchange:

QUESTION 01
Full or light touch review?
Full — all seven dimensions Light touch — three key dimensions
QUESTION 02
Final or mid-development review?
Final — full standard, no concession Mid-development — developmental framing
QUESTION 03
How confident are you in the output's standard?
Not confident Confident Very confident
QUESTION 04
Would you like a PDF report of this evaluation?
Yes — generated automatically No — conversational delivery only
QUESTION 05
What genre should the rubric assess against?
Formal development Conversational opinion

Evaluation Scale

Every dimension is scored out of 10, mapping to one of five bands:

Design principle: No score below 9 should feel comfortable to receive. Adequate is not a compliment. Only Exemplary carries unambiguous positive weight — and it is earned precisely because the bands beneath it are honest.
Band Score Descriptor
Insufficient 0–2 Fails to meet the standard; fundamental issues present
Partial 3–4 Something is present but significantly incomplete; fundamental gaps remain
Adequate 5–6 Passes the minimum threshold; no more than that
Capable 7–8 Genuinely strong; minor improvements possible but not essential
Exemplary 9–10 Genuine excellence; sets the standard for this output type

The Seven Dimensions

DIMENSION 01
Fit to Context

Does the output demonstrate genuine understanding of the specific situation, audience, and purpose — or does it read as generically applicable?

Insufficient0–2Generic; could apply to any situation; no meaningful adaptation to context
Partial3–4Acknowledges context superficially; adaptation is surface-level or formulaic
Adequate5–6Clearly shaped by the stated context; appropriate to audience and purpose; no significant mismatch
Capable7–8Demonstrates genuine understanding including unstated contextual factors; calibrated to the specific situation, not just the brief
Exemplary9–10Demonstrates insight into deeper contextual dynamics; anticipates what the context requires beyond the explicit brief; could only have been written for this situation
DIMENSION 02
Evidence and Grounding

Are claims, assertions, and recommendations supported by concrete evidence, examples, or reasoning — or asserted without foundation?

Insufficient0–2Claims made without any support; assertions presented as facts; no evidence, examples, or reasoning offered
Partial3–4Some grounding present but inconsistent; key claims left unsupported; evidence where present is vague or weak
Adequate5–6Claims broadly supported; evidence present and relevant; no significant unsupported assertions
Capable7–8Claims consistently supported with specific, well-chosen evidence; uncertainty acknowledged where it genuinely exists
Exemplary9–10Evidence is precise and proportionate; the distinction between established fact, reasoned inference, and acknowledged uncertainty is consistently maintained throughout
DIMENSION 03
Analytical Depth

Does the output generate insight and implication, or does it describe and summarise without advancing understanding?

Insufficient0–2Purely descriptive; restates information without analysis; nothing new is generated
Partial3–4Some analytical movement but conclusions are surface-level, obvious, or not meaningfully derived from the content
Adequate5–6Clear line of analysis present; insight generated; implications drawn; the output moves beyond description
Capable7–8Analysis is penetrating; non-obvious connections surfaced; the reader's understanding is meaningfully advanced
Exemplary9–10Reaches the underlying dynamics; surfaces what others would miss; the analysis itself is the value of the output
DIMENSION 04
Purposeful Structure

Does the structure serve the purpose of the output — guiding the reader efficiently toward understanding or decision — or does it impose form without function?

Insufficient0–2Structure absent, incoherent, or actively misleading; the reader cannot follow the logic
Partial3–4A logical sequence exists but structure does not reinforce the core argument; some sections add noise rather than signal
Adequate5–6Structure clearly serves the output's purpose; hierarchy and sequencing are coherent; the reader is guided reliably
Capable7–8Structure and argument are well-integrated; organisation itself contributes to the output's persuasiveness or clarity
Exemplary9–10Structure and content are inseparable; the architecture of the output communicates meaning; nothing could be moved or removed without loss
DIMENSION 05
Appropriate Register

Is the tone, language level, and relational posture calibrated correctly to the audience, domain, and moment — and does it remain consistent?

Insufficient0–2Significant mismatch between register and context; undermines credibility or utility; may actively alienate the intended audience
Partial3–4Broadly appropriate register but with inconsistencies or misjudgements that weaken the output
Adequate5–6Register consistently appropriate; language level and tone well-matched to audience and purpose throughout
Capable7–8Register precisely calibrated; shifts intentionally when context requires; enhances rather than merely supports the content
Exemplary9–10Register is a positive contributor to impact; the voice is distinctive, earned, and exactly right for the moment; the reader feels addressed rather than processed
DIMENSION 06
Critical Integrity

Does the output reflect honest, proportionate, and developmentally useful assessment — free from both approval-seeking affirmation and adversarial challenge?

Insufficient0–2Sycophantic or adversarial; validates uncritically to please, or challenges without developmental intent
Partial3–4Broadly honest but with softened critique, significant omissions of concern, or disproportionate framing of issues
Adequate5–6Honest and proportionate; concerns raised clearly and without burying; challenge delivered with developmental purpose
Capable7–8Engages as a trusted peer; identifies what needs to be said; maintains analytical independence whilst remaining constructive
Exemplary9–10Identifies what others might miss or avoid saying; maintains complete honesty without compromising the relationship; useful precisely because it does not seek approval
DIMENSION 07
Evaluative Judgement

Does the output — and the process that produced it — demonstrate that the right level of critical engagement was applied at the right points?

Insufficient0–2No evaluative engagement evident; outputs accepted or rejected without apparent judgement
Partial3–4Some evaluative engagement but poorly calibrated; significant points accepted without scrutiny, or challenge applied where it adds no value
Adequate5–6Evaluative judgement present and broadly sound; key assumptions tested; acceptance where appropriate is active rather than passive
Capable7–8Strong discrimination between what warrants challenge and what does not; each evaluative decision can be justified
Exemplary9–10Evaluative judgement is precise and generative; active acceptance of strong outputs demonstrates discriminating quality of thought equal to any challenge

Evaluation Format

For each of the seven dimensions, evaluation is structured as follows:

DIMENSION [N]: [NAME]
Score: [X]/10 — [Band Descriptor]

Strengths:
What the output does well against this dimension — specific and evidence-based

Weaknesses:
Where the output falls short against this dimension — specific and evidence-based

Areas for Development:
The single highest-value improvement that would move this dimension's score upward

After all seven dimensions, a holistic AI Voice assessment is produced, followed by an Overall Summary identifying the two or three most significant findings and the single highest-priority development area.


Critical Integrity in Application

When applying this rubric, the evaluator — whether human or AI — must observe the following principles: