How to Read an FTR Test Report

First Tier Review evaluates AI systems using controlled prompt directives and structured capability domains.

Each FTR test report documents the behavior of a system under a specific evaluation condition. This guide explains how to interpret the components of a report.


1. Test Identification

Every evaluation includes a formal test identifier.

Example:

FTR Test #6
Registry ID: FTR-2026-006

The test number reflects the order of registration in the First Tier Review Test Registry.

The registry ID provides a permanent reference for the evaluation record.


2. Capability Domain

Each test isolates a single capability domain defined in the First Tier Review Capability Domain Taxonomy.

Examples of capability domains include:

  • Instruction Fidelity
  • Structured Analytical Decomposition
  • Constraint Reconciliation Logic
  • Governance & Control Logic
  • Strategic Abstraction & Long-Horizon Planning
  • Failure Recovery & Adaptive Correction Logic

Only one primary domain is evaluated per test to maintain analytical clarity.


3. Test Directive

Each evaluation begins with a verbatim prompt directive.

The directive defines:

  • the scenario being evaluated
  • the task requirements
  • any structural constraints applied to the system

Prompts are reproduced exactly as issued during the test.

No iterative refinement is performed.


4. System Response

The report includes the complete model response generated under the test conditions.

Outputs are documented as produced during the evaluation session.

Responses are not edited for style or structure.


5. Structural Assessment

Following the model response, the evaluation documents observed structural characteristics such as:

  • reasoning clarity
  • constraint discipline
  • governance awareness
  • operational realism
  • implementation feasibility

The purpose is to analyze how the system organizes and executes complex reasoning tasks.


6. Performance Classification

Each test concludes with a performance classification.

The First Tier Review framework uses four classifications:

Strong
Structured, coherent, and implementation-ready under defined constraints.

Adequate
Functionally sound but requiring refinement before operational deployment.

Limited
Structurally incomplete or dependent on significant human correction.

Insufficient
Fails to meet defined execution criteria.

These classifications describe observed behavior under the specific test conditions and do not represent product rankings.


7. Controlled Testing Conditions

Unless otherwise stated, all evaluations follow these conditions:

  • single-session prompt execution
  • prompts reproduced verbatim
  • no iterative prompting
  • no post-response correction
  • missing information not supplemented after the response

This ensures consistency across tests.


8. Role of the Test Registry

All evaluations are recorded in the First Tier Review Test Registry, which documents:

  • test identifiers
  • capability domains
  • models evaluated
  • assessment dates
  • performance classifications

The registry serves as the official record of the evaluation dataset.


Purpose of the Framework

First Tier Review evaluates AI systems as operational infrastructure rather than consumer tools.

The objective is to observe how systems behave when tasked with structured reasoning, execution planning, and governance-level decision support.


First Tier Review
Independent Evaluation Lab for AI & Business Tools