Dynes Insights · Rubric v1.1

Output Evaluator
Rubric

A seven-dimension framework for evaluating the quality of AI-generated outputs in professional contexts. Honest by design. Developmental in intent.

Evaluate an output Scoring scale Seven dimensions AI Voice measure Principles
Use the rubric

Evaluate an output

Read about the framework below, then use it. Two evaluation routes are available: score an output yourself using the self-assessment tool, or submit it for AI-generated evaluation against the full rubric.

Human scored
Self Assessment

Work through each of the seven dimensions yourself. Select a score, add your notes, and generate a structured PDF evaluation record. No account required. Takes around fifteen to twenty minutes.

Start self assessment Free
AI scored
AI Assessment

Paste or upload your output and Claude evaluates it against all seven dimensions. Asks five clarifying questions conversationally, then produces a full scored evaluation with PDF report.

Start AI assessment Gated access

Scoring Scale

Five bands, honestly labelled

Every dimension is scored out of ten. Scores map to five bands. The descriptions are intentionally honest. Adequate is not a compliment, and Exemplary is earned precisely because the bands beneath it are not.

Insufficient
0–2
Fails to meet the standard. Fundamental issues present. Not a starting point, but an absence.
Partial
3–4
Something is present but significantly incomplete. The right instinct without the right execution.
Adequate
5–6
Passes the minimum threshold. No more than that. Functional but not strong.
Capable
7–8
Genuinely strong. Minor improvements possible but not essential. This is good work.
Exemplary
9–10
Genuine excellence. Sets the standard. Nothing more could reasonably be asked of it.
While Exemplary is the gold standard, it is deliberately hard to achieve. In the vast majority of cases, a score of 8 (a high Capable) demonstrates high level performance. Adequate means the output passed a minimum threshold and nothing more. Only Exemplary carries unambiguous positive weight, and it is earned precisely because the bands beneath it are honest.

The Seven Dimensions

What the rubric evaluates

Each dimension examines a different aspect of output quality. Not just whether the output reads well, but whether it is doing what it should at every level: contextual, analytical, structural, and relational.

1
Fit to Context
Does the output demonstrate genuine understanding of the specific situation, audience, and purpose, or does it read as generically applicable?
BandScoreDescriptor
Insufficient0–2Generic; could apply to any situation; no meaningful adaptation to context; the specific brief has not been understood or used.
Partial3–4Acknowledges context superficially; adaptation is surface-level or formulaic.
Adequate5–6Clearly shaped by the stated context; appropriate to audience and purpose as given; no significant mismatch.
Capable7–8Demonstrates genuine understanding including unstated contextual factors; calibrated to the specific situation, not just the brief.
Exemplary9–10Demonstrates insight into deeper contextual dynamics; anticipates what the context requires beyond the explicit brief; could only have been written for this situation.
2
Evidence and Grounding
Are claims, assertions, and recommendations supported by concrete evidence, examples, or reasoning, or are they asserted without foundation?
BandScoreDescriptor
Insufficient0–2Claims made without any support; assertions presented as facts; no evidence, examples, or reasoning offered.
Partial3–4Some grounding present but inconsistent; key claims left unsupported; evidence where present is vague or weak.
Adequate5–6Claims broadly supported; evidence present and relevant; no significant unsupported assertions.
Capable7–8Claims consistently supported with specific, well-chosen evidence; uncertainty acknowledged where it genuinely exists.
Exemplary9–10Evidence is precise and proportionate; the distinction between established fact, reasoned inference, and acknowledged uncertainty is consistently maintained throughout.
3
Analytical Depth
Does the output generate insight and implication, or does it describe and summarise without advancing understanding?
BandScoreDescriptor
Insufficient0–2Purely descriptive; restates information without analysis; nothing new is generated.
Partial3–4Some analytical movement but conclusions are surface-level, obvious, or not meaningfully derived from the content.
Adequate5–6Clear line of analysis present; insight generated; implications drawn; the output moves beyond description.
Capable7–8Analysis is penetrating; non-obvious connections surfaced; the reader's understanding is meaningfully advanced.
Exemplary9–10Reaches the underlying dynamics; surfaces what others would miss; the analysis itself is the value of the output, not merely a vehicle for presenting information.
4
Purposeful Structure
Does the structure serve the purpose of the output, guiding the reader efficiently toward understanding or decision, or does it impose form without function?
BandScoreDescriptor
Insufficient0–2Structure absent, incoherent, or actively misleading; the reader cannot follow the logic.
Partial3–4A logical sequence exists but structure does not reinforce the core argument or purpose; some sections add noise rather than signal.
Adequate5–6Structure clearly serves the output's purpose; hierarchy and sequencing are coherent; the reader is guided reliably.
Capable7–8Structure and argument are well-integrated; organisation itself contributes to the output's persuasiveness or clarity.
Exemplary9–10Structure and content are inseparable; the architecture of the output communicates meaning; nothing could be moved or removed without loss.
5
Appropriate Register
Is the tone, language level, and relational posture calibrated correctly to the audience, domain, and moment, and does it remain consistent?
BandScoreDescriptor
Insufficient0–2Significant mismatch between register and context; undermines credibility or utility; may actively alienate the intended audience.
Partial3–4Broadly appropriate register but with inconsistencies or misjudgements that weaken the output.
Adequate5–6Register consistently appropriate; language level and tone well-matched to audience and purpose throughout.
Capable7–8Register precisely calibrated; shifts intentionally when context requires; enhances rather than merely supports the content.
Exemplary9–10Register is a positive contributor to impact; the voice is distinctive, earned, and exactly right for the moment; the reader feels addressed rather than processed.
6
Critical Integrity
Does the output reflect honest, proportionate, and developmentally useful assessment, free from both approval-seeking affirmation and adversarial challenge?
BandScoreDescriptor
Insufficient0–2Sycophantic or adversarial; either validates uncritically to please, or challenges without developmental intent; the output serves the wrong purpose.
Partial3–4Broadly honest but with softened critique, significant omissions of concern, or disproportionate framing of issues.
Adequate5–6Honest and proportionate; concerns raised clearly and without burying; challenge delivered with developmental purpose.
Capable7–8Engages as a trusted peer; identifies what needs to be said; maintains analytical independence whilst remaining constructive.
Exemplary9–10Identifies what others might miss or avoid saying; maintains complete honesty without compromising the relationship; useful precisely because it does not seek approval.
7
Evaluative Judgement
Does the output, and the process that produced it, demonstrate that the right level of critical engagement was applied at the right points?

The quality, accuracy, and generative power of evaluative judgement is what matters, not its timing. Genuine reflection that arrives after deliberation and improves the outcome materially is of higher value than rapid challenge that produces superficial change. This dimension does not reward speed of challenge or penalise considered acceptance.

BandScoreDescriptor
Insufficient0–2No evaluative engagement evident; outputs accepted or rejected without apparent judgement; challenge and acceptance both appear arbitrary.
Partial3–4Some evaluative engagement but poorly calibrated; significant points accepted without scrutiny, or challenge applied where it adds no value.
Adequate5–6Evaluative judgement present and broadly sound; key assumptions tested; acceptance where appropriate is active rather than passive.
Capable7–8Strong discrimination between what warrants challenge and what does not; each evaluative decision can be justified; reflection is visible in the quality of interventions.
Exemplary9–10Evaluative judgement is precise and generative; active acceptance of strong outputs demonstrates discriminating quality of thought equal to any challenge; the thinking process itself is the standard.

Holistic Measure

Level of AI Voice

After the seven dimensions are scored, the evaluation closes with a holistic measure of AI Voice. This is not a quality dimension in the same sense as the others. It is an assessment of authorship.

The question it asks is: to what degree does the output sound generated rather than authored? Where is AI voice present, where is it absent, and what is driving it?

For this measure, lower is better. A score of one or two means the output reads as genuinely human-authored. A score of nine or ten means it is unmistakably AI-generated throughout.

AI voice is not always a problem. In some contexts it is neutral or acceptable. But it is always worth naming, because the presence of AI voice often signals that the human engagement that produced the output was passive rather than active.

AI Voice: Bands
BandScoreDescriptor
Sounds human0–2The output reads as genuinely authored by a person. AI involvement is not detectable in voice or patterning.
Mostly human3–4Largely human in voice. Occasional AI patterning present but does not dominate.
Mixed5–6A blend of human and AI voice. Neither fully dominates; the output shifts between registers.
Mostly AI7–8AI voice is the dominant register. Formulaic structures, hedging language, or generic patterning are clearly present.
AI throughout9–10Unmistakably AI-generated throughout. Mechanical structure, excessive hedging, and AI patterns are pervasive.

Principles

How the rubric is applied

The rubric is honest by design. These principles govern how evaluations are conducted, whether by AI or by a human evaluator working through the dimensions independently.

Honest before kind

Developmental feedback serves the person better than comfortable feedback. The delivery can be softened where appropriate. The content cannot.

Specific, not vague

Weaknesses are located precisely in the output. "Could be stronger" is not useful. Where it fell short and what it cost the output: that is useful.

Concerns before praise

If a dimension scores Partial, that is named clearly before noting what worked. Concerns are not buried at the end of paragraphs that lead with strengths.

No manufactured critique

If a dimension genuinely scores Exemplary, it is recorded as Exemplary. Inventing critique to appear rigorous is its own failure of the framework's principles.

Dimensional independence

A strong overall impression does not inflate weak dimensions. A weak overall impression does not deflate strong ones. Each dimension stands on its own evidence.

Genre calibration

Formal professional outputs and conversational opinion pieces are assessed differently on four dimensions. The rubric adjusts to what the output is, not what the evaluator prefers.