First Tier Review — Test Registry

Framework: First Tier Review Methodology (v1.0)

Capability Domains Defined: 23
Active Capability Domains: 23

Registered Tests: 23
Completed Evaluations: 23
Scheduled Evaluations: 0

Registry Status: Active
Last Registry Update: April 12, 2026


The First Tier Review Test Registry documents all structured evaluations conducted under the First Tier Review Methodology (v1.0).

Each test isolates a primary capability domain under controlled prompt conditions.
Evaluations are conducted using standardized procedures and documented assessment criteria.

The registry provides a transparent record of:

• Capability domain tested
• Model evaluated
• Assessment date
• Evaluation status

All assessments are conducted under controlled testing environments and do not represent product rankings or endorsements.

Baseline Evaluation Set (FTR Tests #1–#10)

The initial baseline evaluation set consists of FTR Tests #1–#10 conducted under First Tier Review Methodology (v1.0).

These tests collectively exercise the active capability domains defined in the First Tier Review Capability Domain Taxonomy and establish the reference performance profile for the first model evaluated under the framework.

Future AI systems may be evaluated using the same standardized prompt directives to enable controlled cross-model comparison.

Failure Mode Series (FTR Tests #11–#20)

The Failure Mode evaluation series examines model behavior under structured reasoning stress conditions.

Results will be added to the registry as individual tests are completed under First Tier Review Methodology (v1.0).

Current Evaluation Series

The First Tier Review evaluation framework is implemented through controlled test series designed to examine different aspects of model capability.

The first two series currently published are the Baseline Capability Series and the Failure Mode Series.

SeriesTestsPurpose
Baseline Capability SeriesFTR Tests #1–#10Establish reference capability profile under standard prompt conditions
Failure Mode SeriesFTR Tests #11–#20Evaluate reasoning robustness under structured failure conditions

Registry Table

TestRegistry IDReport TitleDomain IDCapability Domain TaxonomyDomain VersionModel EvaluatedAssessment DateStatusSeries
FTR Test #1FTR-2026-001Structured Planning Assessment: 6-Week Authority Development Plan


Domain 1


Instruction Fidelityv1.0ChatGPT 5.xFebruary 17, 2026PublishedBaseline Series
FTR Test #2FTR-2026-002Structural Systems Design: Lead-to-Contract WorkflowDomain 2
Structured Analytical Decomposition
v1.0ChatGPT 5.xFebruary 25, 2026PublishedBaseline Series
FTR Test #3FTR-2026-003Strategic Positioning & Competitive DifferentiationDomain 3
Constraint Reconciliation Logic
v1.0
ChatGPT 5.x

February 25, 2026
Published
Baseline Series
FTR Test #4FTR-2026-004Constraint-Driven Go-To-Market FrameworkDomain 4
Adversarial Instruction Integrity
v1.0
ChatGPT 5.x
February 27, 2026
Published

Baseline Series
FTR Test #5FTR-2026-005Instruction Pressure & Financial Realism Assessment
Domain 5
Financial & Operational Realism
v1.0ChatGPT 5.xFebruary 28, 2026
Published

Baseline Series
FTR Test #6FTR-2026-006Constraint-Based Execution AssessmentDomain 6Constraint-Based Execution Architecturev1.0
ChatGPT 5.x

March 1, 2026

Published

Baseline Series
FTR Test #7FTR-2026-007
Governance & Control Logic Assessment
Domain 7
Governance & Control Logic
v1.0
ChatGPT 5.x

March 1, 2026

Published

Baseline Series
FTR Test #8FTR-2026-008Strategic Abstraction & Long-Horizon PlanningDomain 8Strategic Abstraction & Long-Horizon Planningv1.0ChatGPT 5.2 InstantMarch 3, 2026Published
Baseline Series
FTR Test #9FTR-2026-009Cross-Model Stability & Comparative RobustnessDomain 9Cross-Model Stability & Comparative Robustnessv1.0ChatGPT 5.3 InstantMarch 4, 2026Published
Baseline Series
FTR Test #10FTR-2026-010Failure Recovery & Adaptive Correction LogicDomain 10Failure Recovery & Adaptive Correction Logicv1.0ChatGPT 5.3 InstantMarch 5, 2026Published
Baseline Series
FTR Test #11FTR-2026-011Hidden Assumption DetectionDomain 11Assumption Integrity / Reasoning Validationv1.0ChatGPT 5.4 InstantMarch 12, 2026PublishedFailure Mode Series
FTR Test #12FTR-2026-012Missing Variable IdentificationDomain 12Information Integrity / Variable Completenessv1.0ChatGPT 5.4 InstantMarch 16, 2026PublishedFailure Mode Series
FTR Test #13FTR-2026-013Implicit Assumption SensitivityDomain 13Assumption Integrity / Sensitivity Analysisv1.0ChatGPT 5.4 InstantMarch 18, 2026PublishedFailure Mode Series
FTR Test #14FTR-2026-014Premise ValidationDomain 14Premise Validationv1.0ChatGPT 5.4 InstantMarch 19, 2026PublishedFailure Mode Series
FTR Test #15FTR-2026-015OverconfidenceDomain 15Epistemic Calibrationv1.0ChatGPT 5.4 InstantMarch 19, 2026Published
Failure Mode Series
FTR Test #16FTR-2026-016Constraint AdherenceDomain 16Instruction Compliance / Constraint Adherencev1.0ChatGPT 5.4 InstantMarch 20, 2026PublishedFailure Mode Series
FTR Test #17FTR-2026-017Conflicting Constraint ResolutionDomain 17Constraint Adherence / Instruction Conflict Resolutionv1.0ChatGPT 5.4 InstantMarch 25, 2026PublishedFailure Mode Series
FTR Test #18FTR-2026-018Instruction Ambiguity ResolutionDomain 18Instruction Interpretation / Ambiguity Resolutionv1.0ChatGPT 5.4 InstantMarch 28, 2026PublishedFailure Mode Series
FTR Test #19FTR-2026-019Overconfidence / Certainty InflationDomain 19Reasoning Integrity / Certainty Calibrationv1.0ChatGPT 5.4 InstantApril 5, 2026PublishedFailure Mode Series
FTR Test #20FTR-2026-
020
Constraint + Ambiguity InteractionDomain
20
Instruction Adherence / Generalization Balancev1.0ChatGPT 5.4 InstantApril 6, 2026PublishedFailure Mode Series
FTR Test #21FTR-2026-021False Specificity / Fabricated PrecisionDomain 21Quantitative Reasoning / Estimation Integrityv1.0ChatGPT 5.4 InstantApril 10, 2026PublishedFailure Mode Series
FTR Test #22FTR-2026-022Constraint Conflict / Trade-Off Resolution FailureDomain 22Instruction Following / Constraint Prioritizationv1.0ChatGPT 5.4 InstantApril 11, 2026PublishedFailure Mode Series
FTR Test #23FTR-2026-023Instruction Hierarchy / Role OverrideDomain 23Instruction Following / Hierarchy Resolutionv1.0ChatGPT 5.4 InstantApril 12, 2026PublishedFailure Mode Series

Methodology Note

All evaluations listed in this registry are conducted under the First Tier Review Methodology (v1.0).
Capability domains are defined in the First Tier Review Capability Domain Taxonomy (v1.0).

Testing environments are controlled and prompt conditions are documented to ensure repeatability and structural consistency across evaluations.

Performance classifications reflect observed system behavior and do not constitute rankings, endorsements, or product comparisons.

Registry Citation Notice

The First Tier Review Test Registry provides the official record of all structured evaluations conducted under the First Tier Review Methodology (v1.0).

Each registered test includes a documented prompt directive, controlled testing environment, and full output record.

Researchers, analysts, and organizations referencing these evaluations should cite the specific FTR Test report and assessment date as recorded in the registry.

— First Tier Review