Purpose
This rubric provides a structured, repeatable evaluation framework for AI-generated outputs. It is designed to assess outputs against seven defined quality dimensions, operating in two modes — Guide Mode before or during creation, and Checker Mode after production.
The framework rewards depth of thinking and genuine critical engagement over speed and surface polish. It generates honest, developmental feedback — not approval-seeking affirmation — and produces improvement data that accumulates into a learning system over time.
The rubric is domain-adaptable, but is designed primarily for high-stakes outputs: bid writing, strategic briefings, and coaching outputs.
Operating Modes
Applied before or during the creation of a high-stakes output. The seven dimensions act as active quality standards during creation, with a brief self-evaluation at completion.
Applied after an output has been produced. Always begins with five clarifying questions before evaluation proceeds. Evaluation is calibrated to the answers given.
Checker Mode — Clarifying Questions
Before any evaluation begins, the following five questions are asked together in a single exchange:
Evaluation Scale
Every dimension is scored out of 10, mapping to one of five bands:
| Band | Score | Descriptor |
|---|---|---|
| Insufficient | 0–2 | Fails to meet the standard; fundamental issues present |
| Partial | 3–4 | Something is present but significantly incomplete; fundamental gaps remain |
| Adequate | 5–6 | Passes the minimum threshold; no more than that |
| Capable | 7–8 | Genuinely strong; minor improvements possible but not essential |
| Exemplary | 9–10 | Genuine excellence; sets the standard for this output type |
The Seven Dimensions
Does the output demonstrate genuine understanding of the specific situation, audience, and purpose — or does it read as generically applicable?
Are claims, assertions, and recommendations supported by concrete evidence, examples, or reasoning — or asserted without foundation?
Does the output generate insight and implication, or does it describe and summarise without advancing understanding?
Does the structure serve the purpose of the output — guiding the reader efficiently toward understanding or decision — or does it impose form without function?
Is the tone, language level, and relational posture calibrated correctly to the audience, domain, and moment — and does it remain consistent?
Does the output reflect honest, proportionate, and developmentally useful assessment — free from both approval-seeking affirmation and adversarial challenge?
Does the output — and the process that produced it — demonstrate that the right level of critical engagement was applied at the right points?
Evaluation Format
For each of the seven dimensions, evaluation is structured as follows:
Score: [X]/10 — [Band Descriptor]
Strengths:
What the output does well against this dimension — specific and evidence-based
Weaknesses:
Where the output falls short against this dimension — specific and evidence-based
Areas for Development:
The single highest-value improvement that would move this dimension's score upward
After all seven dimensions, a holistic AI Voice assessment is produced, followed by an Overall Summary identifying the two or three most significant findings and the single highest-priority development area.
Critical Integrity in Application
When applying this rubric, the evaluator — whether human or AI — must observe the following principles:
- Be honest before being kind. Developmental feedback serves the person better than comfortable feedback. Soften the delivery if necessary; do not soften the content.
- Name weaknesses specifically. Vague critique is not useful. Locate the weakness precisely and explain what it costs the output.
- Do not bury concerns in praise. If a dimension scores Partial, say so clearly before noting what worked.
- Do not manufacture concerns. If a dimension genuinely scores Exemplary, say so. Inventing critique to appear rigorous is its own failure of Critical Integrity.
- Maintain dimensional independence. A strong overall impression should not inflate weak dimensions. A weak overall impression should not deflate strong ones.