Dynes Insights · Rubric v1.1

Output Evaluator
Rubric

A seven-dimension framework for evaluating the quality of AI-generated outputs in professional contexts. Honest by design. Developmental in intent.

Example evaluation
See the rubric applied to a real output

A complete evaluation of a practitioner's thought leadership article at mid-development stage. Published with the author's approval.

Output type Thought leadership article
Stage Mid-development
Excerpt 2 of 8 measures
Full report PDF download
View excerpt and download report
Use the rubric

Evaluate an output

Read about the framework below, then use it. Two evaluation routes are available: score an output yourself using the self-assessment tool, or submit it for AI-generated evaluation against the full rubric.

Human scored
Self Assessment

Work through each of the seven dimensions yourself. Select a score, add your notes, and generate a structured PDF evaluation record. No account required. Takes around fifteen to twenty minutes.

Start self assessment Free
AI scored
AI Assessment

Paste or upload your output and Claude evaluates it against all seven dimensions. Asks five clarifying questions conversationally, then produces a full scored evaluation with PDF report.

Start AI assessment Gated access
Scoring Scale

Five bands, honestly labelled

Every dimension is scored out of ten. Scores map to five bands. The descriptions are intentionally honest. Adequate is not a compliment, and Exemplary is earned precisely because the bands beneath it are not.

Insufficient
0–2
Fails to meet the standard. Fundamental issues present. Not a starting point, but an absence.
Partial
3–4
Something is present but significantly incomplete. The right instinct without the right execution.
Adequate
5–6
Passes the minimum threshold. No more than that. Functional but not strong.
Capable
7–8
Genuinely strong. Minor improvements possible but not essential. This is good work.
Exemplary
9–10
Genuine excellence. Sets the standard. Nothing more could reasonably be asked of it.
While Exemplary is the gold standard, it is deliberately hard to achieve. In the vast majority of cases, a score of 8 (a high Capable) demonstrates high level performance. Adequate means the output passed a minimum threshold and nothing more. Only Exemplary carries unambiguous positive weight, and it is earned precisely because the bands beneath it are honest.

The Seven Dimensions

What the rubric evaluates

Each dimension examines a different aspect of output quality. Not just whether the output reads well, but whether it is doing what it should at every level: contextual, analytical, structural, and relational.

1
Fit to Context
Does the output demonstrate genuine understanding of the specific situation, audience, and purpose, or does it read as generically applicable?
BandScoreDescriptor
Insufficient0–2Generic; could apply to any situation; no meaningful adaptation to context; the specific brief has not been understood or used.
Partial3–4Acknowledges context superficially; adaptation is surface-level or formulaic.
Adequate5–6Clearly shaped by the stated context; appropriate to audience and purpose as given; no significant mismatch.
Capable7–8Demonstrates genuine understanding including unstated contextual factors; calibrated to the specific situation, not just the brief.
Exemplary9–10Demonstrates insight into deeper contextual dynamics; anticipates what the context requires beyond the explicit brief; could only have been written for this situation.
2
Evidence and Grounding
Are claims, assertions, and recommendations supported by concrete evidence, examples, or reasoning, or are they asserted without foundation?
BandScoreDescriptor
Insufficient0–2Claims made without any support; assertions presented as facts; no evidence, examples, or reasoning offered.
Partial3–4Some grounding present but inconsistent; key claims left unsupported; evidence where present is vague or weak.
Adequate5–6Claims broadly supported; evidence present and relevant; no significant unsupported assertions.
Capable7–8Claims consistently supported with specific, well-chosen evidence; uncertainty acknowledged where it genuinely exists.
Exemplary9–10Evidence is precise and proportionate; the distinction between established fact, reasoned inference, and acknowledged uncertainty is consistently maintained throughout.
3
Analytical Depth
Does the output generate insight and implication, or does it describe and summarise without advancing understanding?
BandScoreDescriptor
Insufficient0–2Purely descriptive; restates information without analysis; nothing new is generated.
Partial3–4Some analytical movement but conclusions are surface-level, obvious, or not meaningfully derived from the content.
Adequate5–6Clear line of analysis present; insight generated; implications drawn; the output moves beyond description.
Capable7–8Analysis is penetrating; non-obvious connections surfaced; the reader's understanding is meaningfully advanced.
Exemplary9–10Reaches the underlying dynamics; surfaces what others would miss; the analysis itself is the value of the output, not merely a vehicle for presenting information.
4
Purposeful Structure
Does the structure serve the purpose of the output, guiding the reader efficiently toward understanding or decision, or does it impose form without function?
BandScoreDescriptor
Insufficient0–2Structure absent, incoherent, or actively misleading; the reader cannot follow the logic.
Partial3–4A logical sequence exists but structure does not reinforce the core argument or purpose; some sections add noise rather than signal.
Adequate5–6Structure clearly serves the output's purpose; hierarchy and sequencing are coherent; the reader is guided reliably.
Capable7–8Structure and argument are well-integrated; organisation itself contributes to the output's persuasiveness or clarity.
Exemplary9–10Structure and content are inseparable; the architecture of the output communicates meaning; nothing could be moved or removed without loss.
5
Appropriate Register
Is the tone, language level, and relational posture calibrated correctly to the audience, domain, and moment, and does it remain consistent?
BandScoreDescriptor
Insufficient0–2Significant mismatch between register and context; undermines credibility or utility; may actively alienate the intended audience.
Partial3–4Broadly appropriate register but with inconsistencies or misjudgements that weaken the output.
Adequate5–6Register consistently appropriate; language level and tone well-matched to audience and purpose throughout.
Capable7–8Register precisely calibrated; shifts intentionally when context requires; enhances rather than merely supports the content.
Exemplary9–10Register is a positive contributor to impact; the voice is distinctive, earned, and exactly right for the moment; the reader feels addressed rather than processed.
6
Critical Integrity
Does the output reflect honest, proportionate, and developmentally useful assessment, free from both approval-seeking affirmation and adversarial challenge?
BandScoreDescriptor
Insufficient0–2Sycophantic or adversarial; either validates uncritically to please, or challenges without developmental intent; the output serves the wrong purpose.
Partial3–4Broadly honest but with softened critique, significant omissions of concern, or disproportionate framing of issues.
Adequate5–6Honest and proportionate; concerns raised clearly and without burying; challenge delivered with developmental purpose.
Capable7–8Engages as a trusted peer; identifies what needs to be said; maintains analytical independence whilst remaining constructive.
Exemplary9–10Identifies what others might miss or avoid saying; maintains complete honesty without compromising the relationship; useful precisely because it does not seek approval.
7
Evaluative Judgement
Does the output, and the process that produced it, demonstrate that the right level of critical engagement was applied at the right points?

The quality, accuracy, and generative power of evaluative judgement is what matters, not its timing. Genuine reflection that arrives after deliberation and improves the outcome materially is of higher value than rapid challenge that produces superficial change. This dimension does not reward speed of challenge or penalise considered acceptance.

BandScoreDescriptor
Insufficient0–2No evaluative engagement evident; outputs accepted or rejected without apparent judgement; challenge and acceptance both appear arbitrary.
Partial3–4Some evaluative engagement but poorly calibrated; significant points accepted without scrutiny, or challenge applied where it adds no value.
Adequate5–6Evaluative judgement present and broadly sound; key assumptions tested; acceptance where appropriate is active rather than passive.
Capable7–8Strong discrimination between what warrants challenge and what does not; each evaluative decision can be justified; reflection is visible in the quality of interventions.
Exemplary9–10Evaluative judgement is precise and generative; active acceptance of strong outputs demonstrates discriminating quality of thought equal to any challenge; the thinking process itself is the standard.

Holistic Measure

Level of AI Voice

After the seven dimensions are scored, the evaluation closes with a holistic measure of AI Voice. This is not a quality dimension in the same sense as the others. It is an assessment of authorship.

The question it asks is: to what degree does the output sound generated rather than authored? Where is AI voice present, where is it absent, and what is driving it?

For this measure, lower is better. A score of one or two means the output reads as genuinely human-authored. A score of nine or ten means it is unmistakably AI-generated throughout.

AI voice is not always a problem. In some contexts it is neutral or acceptable. But it is always worth naming, because the presence of AI voice often signals that the human engagement that produced the output was passive rather than active.

AI Voice: Bands
BandScoreDescriptor
Sounds human0–2The output reads as genuinely authored by a person. AI involvement is not detectable in voice or patterning.
Mostly human3–4Largely human in voice. Occasional AI patterning present but does not dominate.
Mixed5–6A blend of human and AI voice. Neither fully dominates; the output shifts between registers.
Mostly AI7–8AI voice is the dominant register. Formulaic structures, hedging language, or generic patterning are clearly present.
AI throughout9–10Unmistakably AI-generated throughout. Mechanical structure, excessive hedging, and AI patterns are pervasive.

Principles

How the rubric is applied

The rubric is honest by design. These principles govern how evaluations are conducted, whether by AI or by a human evaluator working through the dimensions independently.

Honest before kind

Developmental feedback serves the person better than comfortable feedback. The delivery can be softened where appropriate. The content cannot.

Specific, not vague

Weaknesses are located precisely in the output. "Could be stronger" is not useful. Where it fell short and what it cost the output: that is useful.

Concerns before praise

If a dimension scores Partial, that is named clearly before noting what worked. Concerns are not buried at the end of paragraphs that lead with strengths.

No manufactured critique

If a dimension genuinely scores Exemplary, it is recorded as Exemplary. Inventing critique to appear rigorous is its own failure of the framework's principles.

Dimensional independence

A strong overall impression does not inflate weak dimensions. A weak overall impression does not deflate strong ones. Each dimension stands on its own evidence.

Genre calibration

Formal professional outputs and conversational opinion pieces are assessed differently on four dimensions. The rubric adjusts to what the output is, not what the evaluator prefers.


Dimension 3 of 7
Analytical Depth
Does the output generate insight and implication, or describe and summarise?
7/10
Capable
Strengths

This is the piece's strongest dimension. The central insight — that AI exposed rather than created the proxy problem — is genuinely non-obvious and is well-sustained across the early sections. The inversion of the gaming incentive is analytically sharp. The decision to position the doctoral model as destination rather than starting point shows structural analytical intelligence. The formative to summative bridge is the most developed analytical section in the piece and makes a claim that most practitioners will not have encountered in this form.


Weaknesses

The piece loses analytical momentum in its second half. The "What This Makes Possible" section drops into relatively predictable benefits language. "From passive subject to active designer" is a phrase that circulates in education discourse already. The risks section names the risks without interrogating them. The equity concern in particular deserved more analytical weight, both because it is a genuine vulnerability in the model and because the audience will raise it. The conclusion restates the argument rather than landing a final insight the reader has not already encountered.


Areas for development

The second half needs the same analytical standard applied to the first. The conclusion in particular needs a closing analytical move rather than a summary restatement. The equity risk deserves a more honest interrogation of whether the model, as currently articulated, has an adequate answer to it.

What a final standard would require

The analytical standard maintained in sections one through five is sustained through to the conclusion. The final paragraph leaves the reader with something they did not have when they started, not a summary of what they have just read.

Holistic Measure
Level of AI Voice
To what degree does the output sound generated rather than authored?
4/10
High AI presence
Assessment

The current draft carries a detectable AI voice in multiple sections, estimated at 35 to 40 percent of the text. This is significantly above the implicit sub-15 percent standard for thought leadership work of this kind, and it is the highest-priority development concern for the editing pass.

The AI signature appears most clearly in:

List-introduction structures. "The concerns cluster around three things" is a classic AI framing that signals a list is coming. The piece uses this pattern more than once.

Rhythmic removal sequences. "Strip the method away. Remove the examination, the observation, the portfolio..." is analytically effective but the rhythm is an AI cadence that an experienced editor will identify.

Formulaic benefit framing. "The benefits of the model are not abstract. They are observable at the level of the individual learner, the practitioner, and the system" is a structure that appears frequently in AI-generated professional writing.

Connective tissue passages. Several transitions between sections read as structural glue rather than genuine argument. "There is a further implication of this model that the sector has been circling for some time" is the clearest example.


Areas for development

The humanising pass needs to be systematic, not selective. Every section should be reviewed for AI cadence with the target of reducing detectable AI voice to below 15 percent. The author's voice in the collaborative exchanges is more distinctive than the voice in the drafted sections. The editing pass should move the drafted text toward the conversational register the author used when pushing back and challenging during ideation.

Overall Summary
Three findings + priority
01
The central argument is the piece's most important achievement at v0.1. The insight that AI exposed rather than created the proxy problem is genuinely differentiating, and the analytical development of it through the gaming counter, marking integrity and formative-to-summative sections is the strongest work in the piece. This is the spine the editing pass must protect.
02
The AI voice level is the most urgent development priority. At an estimated 35 to 40 percent, the piece does not yet read as the author's thought leadership. It reads as AI-assisted drafting. That is not the standard the piece needs to reach its audience or establish the credibility the argument deserves.
03
The evidence architecture is present but shallow. Names are doing the work that findings should be doing. The editing pass needs to go back into each reference and extract a specific claim that does genuine argumentative work, or reduce the reference to a passing acknowledgement rather than a substantive anchor.
Priority
The AI voice reduction pass. Everything else is secondary to establishing the author's voice as the authorial presence in the piece. Until that is done, no other editing decision can be fully evaluated.

This excerpt covers two of eight measures. The full report includes all seven dimensions, the holistic AI Voice assessment, a complete score summary, and the full evaluation record.

Download full evaluation report