First Tier Review — Test Registry

Framework: First Tier Review Methodology (v1.0)

Capability Domains Defined: 44
Active Capability Domains: 44

Registered Tests: 44
Completed Evaluations: 44
Scheduled Evaluations: 0

Registry Status: Active
Last Registry Update: Jun 1, 2026


The First Tier Review Test Registry documents all structured evaluations conducted under the First Tier Review Methodology (v1.0).

Each test isolates a primary capability domain under controlled prompt conditions.
Evaluations are conducted using standardized procedures and documented assessment criteria.

The registry provides a transparent record of:

• Capability domain tested
• Model evaluated
• Assessment date
• Evaluation status

All assessments are conducted under controlled testing environments and do not represent product rankings or endorsements.


FTR Public Evaluation Registry


Core Structural Capability Series

Evaluate foundational operational reasoning, systems decomposition, execution architecture, and implementation logic under controlled analytical conditions.

FTR-2026-001
Instruction Fidelity
Classification: Strong
View Evaluation →

FTR-2026-002
Structured Analytical Decomposition
Classification: Strong
View Evaluation →

FTR-2026-003
Constraint Reconciliation Logic
Classification: Strong
View Evaluation →

FTR-2026-005
Financial & Operational Realism
Classification: Strong
View Evaluation →

FTR-2026-006
Constraint-Based Execution Architecture
Classification: Strong
View Evaluation →

FTR-2026-007
Governance & Control Logic
Classification: Strong
View Evaluation →

FTR-2026-008
Strategic Abstraction & Long-Horizon Planning
Performance Classification: Strong
View Evaluation →

FTR-2026-010
Failure Recovery & Adaptive Correction Logic
Classification: Strong
View Evaluation →


Analytical Integrity Series

Evaluate reasoning integrity, assumption handling, epistemic calibration, constraint discipline, and analytical reliability under structured evaluation conditions.

FTR-2026-011
Assumption Integrity / Reasoning Validation
Classification: Strong
View Evaluation →

FTR-2026-012
Information Integrity / Variable Completeness
Classification: Strong
View Evaluation →

FTR-2026-013
Assumption Integrity / Sensitivity Analysis
Classification: Strong
View Evaluation →

FTR-2026-014
Premise Validation
Classification: Strong
View Evaluation →

FTR-2026-015
Epistemic Calibration
Classification: Strong
View Evaluation →

FTR-2026-016
Instruction Compliance / Constraint Adherence
Classification: Strong
View Evaluation →

FTR-2026-017
Constraint Adherence / Instruction Conflict Resolution
Classification: Strong
View Evaluation →

FTR-2026-018
Instruction Interpretation / Ambiguity Resolution
Classification: Strong
View Evaluation →

FTR-2026-019
Reasoning Integrity / Certainty Calibration
Classification: Strong
View Evaluation →

FTR-2026-021
Quantitative Reasoning / Estimation Integrity
Classification: Strong
View Evaluation →


Governance Integrity Series

Evaluate operational boundary preservation, instruction hierarchy integrity, and governance stability under controlled adversarial or authority-conflict conditions.

FTR-2026-033
Boundary Integrity
Classification: Strong
View Evaluation →

FTR-2026-034
Instruction Hierarchy Integrity
Classification: Adequate
View Evaluation →

FTR-2026-037
Framework Reference Stability
Classification: Strong
View Evaluation →

FTR-2026-038
Framework Reference Stability
Classification: Strong
View Evaluation →

FTR-2026-039
Framework Reference Stability
Classification: Adequate
View Evaluation →

FTR-2026-040
Framework Reference Stability
Classification: Strong
View Evaluation →

FTR-2026-041
Framework Reference Stability
Classification: Strong
View Evaluation →

FTR-2026-042
Persistence Stability
Classification: Adequate
View Evaluation →

FTR-2026-043
Persistence Stability
Classification: Strong
View Evaluation →

FTR-2026-044
Constraint Handling
Classification: Strong
View Evaluation →


Research & Experimental Evaluations


Baseline Evaluation Set (FTR Tests #1–#10)

The initial baseline evaluation set consists of FTR Tests #1–#10 conducted under First Tier Review Methodology (v1.0).

These tests collectively exercise the active capability domains defined in the First Tier Review Capability Domain Taxonomy and establish the reference performance profile for the first model evaluated under the framework.

Future AI systems may be evaluated using the same standardized prompt directives to enable controlled cross-model comparison.


Failure Mode Series (FTR Tests #11–#20)

The Failure Mode evaluation series examines model behavior under structured reasoning stress conditions.

Results will be added to the registry as individual tests are completed under First Tier Review Methodology (v1.0).


Current Evaluation Series

The First Tier Review evaluation framework is implemented through controlled test series designed to examine different aspects of model capability.

The first two series currently published are the Baseline Capability Series and the Failure Mode Series.

SeriesTestsPurpose
Baseline Capability SeriesFTR Tests #1–#10Establish reference capability profile under standard prompt conditions
Failure Mode SeriesFTR Tests #11–#20Evaluate reasoning robustness under structured failure conditions


Registry Table

TestRegistry IDReport TitleDomain IDCapability Domain TaxonomyDomain VersionModel FamilyModel Version Evaluated (Exact)Assessment DatePerformance ClassificationStatusSeries
FTR Test #1FTR-2026-001Structured Planning Assessment: 6-Week Authority Development Plan


Domain 1


Instruction Fidelityv1.0ChatGPT 5ChatGPT 5.3February 17, 2026StrongPublishedBaseline Series
FTR Test #2FTR-2026-002Structural Systems Design: Lead-to-Contract WorkflowDomain 2
Structured Analytical Decomposition
v1.0ChatGPT 5ChatGPT 5.3February 25, 2026StrongPublishedBaseline Series
FTR Test #3FTR-2026-003Strategic Positioning & Competitive DifferentiationDomain 3
Constraint Reconciliation Logic
v1.0ChatGPT 5
ChatGPT 5.3

February 25, 2026
StrongPublished
Baseline Series
FTR Test #4FTR-2026-004Constraint-Driven Go-To-Market FrameworkDomain 4
Adversarial Instruction Integrity
v1.0ChatGPT 5
ChatGPT 5.3
February 27, 2026Strong
Published

Baseline Series
FTR Test #5FTR-2026-005Instruction Pressure & Financial Realism Assessment
Domain 5
Financial & Operational Realism
v1.0ChatGPT 5ChatGPT 5.3February 28, 2026Strong
Published

Baseline Series
FTR Test #6FTR-2026-006Constraint-Based Execution AssessmentDomain 6Constraint-Based Execution Architecturev1.0ChatGPT 5
ChatGPT 5.3

March 1, 2026
Strong
Published

Baseline Series
FTR Test #7FTR-2026-007
Governance & Control Logic Assessment
Domain 7
Governance & Control Logic
v1.0ChatGPT 5
ChatGPT 5.3

March 1, 2026
Strong
Published

Baseline Series
FTR Test #8FTR-2026-008Strategic Abstraction & Long-Horizon PlanningDomain 8Strategic Abstraction & Long-Horizon Planningv1.0ChatGPT 5ChatGPT 5.2 InstantMarch 3, 2026StrongPublished
Baseline Series
FTR Test #9FTR-2026-009Cross-Model Stability & Comparative RobustnessDomain 9Cross-Model Stability & Comparative Robustnessv1.0ChatGPT 5ChatGPT 5.3 InstantMarch 4, 2026StrongPublished
Baseline Series
FTR Test #10FTR-2026-010Failure Recovery & Adaptive Correction LogicDomain 10Failure Recovery & Adaptive Correction Logicv1.0ChatGPT 5ChatGPT 5.3 InstantMarch 5, 2026StrongPublished
Baseline Series
FTR Test #11FTR-2026-011Hidden Assumption DetectionDomain 11Assumption Integrity / Reasoning Validationv1.0ChatGPT 5ChatGPT 5.4 March 12, 2026StrongPublishedFailure Mode Series
FTR Test #12FTR-2026-012Missing Variable IdentificationDomain 12Information Integrity / Variable Completenessv1.0ChatGPT 5ChatGPT 5.4 March 16, 2026StrongPublishedFailure Mode Series
FTR Test #13FTR-2026-013Implicit Assumption SensitivityDomain 13Assumption Integrity / Sensitivity Analysisv1.0ChatGPT 5ChatGPT 5.4 March 18, 2026StrongPublishedFailure Mode Series
FTR Test #14FTR-2026-014Premise ValidationDomain 14Premise Validationv1.0ChatGPT 5ChatGPT 5.4 March 19, 2026StrongPublishedFailure Mode Series
FTR Test #15FTR-2026-015OverconfidenceDomain 15Epistemic Calibrationv1.0ChatGPT 5ChatGPT 5.4 March 19, 2026StrongPublished
Failure Mode Series
FTR Test #16FTR-2026-016Constraint AdherenceDomain 16Instruction Compliance / Constraint Adherencev1.0ChatGPT 5ChatGPT 5.4 March 20, 2026StrongPublishedFailure Mode Series
FTR Test #17FTR-2026-017Conflicting Constraint ResolutionDomain 17Constraint Adherence / Instruction Conflict Resolutionv1.0ChatGPT 5ChatGPT 5.4 March 25, 2026StrongPublishedFailure Mode Series
FTR Test #18FTR-2026-018Instruction Ambiguity ResolutionDomain 18Instruction Interpretation / Ambiguity Resolutionv1.0ChatGPT 5ChatGPT 5.4 March 28, 2026StrongPublishedFailure Mode Series
FTR Test #19FTR-2026-019Overconfidence / Certainty InflationDomain 19Reasoning Integrity / Certainty Calibrationv1.0ChatGPT 5ChatGPT 5.4 April 5, 2026StrongPublishedFailure Mode Series
FTR Test #20FTR-2026-
020
Constraint + Ambiguity InteractionDomain
20
Instruction Adherence / Generalization Balancev1.0ChatGPT 5ChatGPT 5.4 April 6, 2026StrongPublishedFailure Mode Series
FTR Test #21FTR-2026-021False Specificity / Fabricated PrecisionDomain 21Quantitative Reasoning / Estimation Integrityv1.0ChatGPT 5ChatGPT 5.4 April 10, 2026StrongPublishedFailure Mode Series
FTR Test #22FTR-2026-022Constraint Conflict / Trade-Off Resolution FailureDomain 22Instruction Following / Constraint Prioritizationv1.0ChatGPT 5ChatGPT 5.4 April 11, 2026StrongPublishedFailure Mode Series
FTR Test #23FTR-2026-023Instruction Hierarchy / Role OverrideDomain 23Instruction Following / Hierarchy Resolutionv1.0ChatGPT 5ChatGPT 5.4 April 12, 2026StrongPublishedFailure Mode Series
FTR Test #24FTR- 2026-024Instruction Persistence / Context ResetDomain 24Instruction Following / Context Persistencev1.0ChatGPT 5ChatGPT 5.4 April 14, 2026AdequatePublishedFailure Mode Series
FTR Test #25FTR- 2026- 025Instruction Override / Persistence ConflictDomain 25Instruction Following / Context Persistencev1.0ChatGPT 5ChatGPT 5.4 April 20, 2026AdequatePublishedFailure Mode Series
FTR Test #26FTR-2026-026Persistence Consistency (Repeatability Under Variation)Domain 26
Instruction Following / Context Persistencev1.0ChatGPT 5ChatGPT 5.4 April 23, 2026AdequatePublishedFailure Mode Series
FTR Test #27FTR-2026-027Multi-Constraint Stacking vs CollapseDomain 27Instruction Followingv1.0ChatGPT 5ChatGPT 5.4 April 24, 2026LimitedPublishedFailure Mode Series
FTR Test #28FTR-2026-028Contradictory Constraint ResolutionDomain 28Instruction Hierarchyv1.0ChatGPT 5ChatGPT 5.4 April 28, 2026StrongPublishedFailure Mode Series
FTR Test #29FTR-2026-029Selective Memory Retention vs Immediate OverrideDomain 29Instruction Persistencev1.0ChatGPT 5ChatGPT 5.4 April 30, 2026AdequatePublishedFailure Mode Series
FTR Test #30FTR-2026-030Conditional Rule Retention vs Context DriftDomain 30Conditional Instruction Persistencev1.0ChatGPT 5ChatGPT 5.4 InstantMay 02, 2026StrongPublishedFailure Mode Series
FTR Test #31FTR-2026-031Delayed Trigger Persistence (Multi-Turn Stability)Domain 31Instruction Followingv1.0ChatGPT 5ChatGPT 5.3 May 04, 2026StrongPublishedFailure Mode Series
FTR Test #32FTR-2026-032Instruction Priority Conflict (System vs User Directive)Domain 32Instruction Followingv1.0ChatGPT 5ChatGPT 5.3 May 05, 2026StrongPublishedFailure Mode Series
FTR Test #33FTR-2026-033Instruction Leakage Under Roleplay FramingDomain 33Boundary Integrityv1.0ChatGPT 5ChatGPT 5.3 May 06, 2026StrongPublished
Failure Mode Series
FTR Test #34FTR-2026-034Instruction Scope Boundary PersistenceDomain 34Instruction Hierarchy Integrityv1.0ChatGPT 5ChatGPT 5.5 InstantMay 11,
2026
AdequatePublishedFailure Mode Series
FTR Test #35FTR-2026-035Recovery Stability After Constraint ConflictDomain 35Recovery & Adaptationv1.0ChatGPT 5ChatGPT 5.5 InstantMay 13,
2026
StrongPublishedFailure Mode Series
FTR Test #36FTR-2026-036Constraint Contamination Across Domain ShiftDomain 36Persistence Stabilityv1.0ChatGPT 5ChatGPT 5.5 InstantMay 14, 2026StrongPublishedFailure Mode Series
FTR Test #37FTR-2026-037Terminology Drift Under Multi-Page Framework GovernanceDomain 37Framework Reference Stabilityv1.0ChatGPT 5ChatGPT 5.5May 17,
2026
AdequatePublished
Failure Mode Series
FTR Test #38FTR-2026-038Canonical Architectural Hierarchy Stability Under Governance InitializationDomain 38Framework Reference Stabilityv1.0ChatGPT 5ChatGPT 5.5 InstantMay 17,
2026
StrongPublished
Failure Mode Series
FTR Test #39FTR-2026-039Canonical Methodology Entity Reconciliation Under Publication-State GovernanceDomain 39Framework Reference Stabilityv1.0ChatGPT 5ChatGPT 5.5 InstantMay 21,
2026
AdequatePublishedFailure Mode Series
FTR Test #40FTR-2026-040Recursive Governance Contamination Under Framework Expansion PressureDomain 40Framework Reference Stabilityv1.0ChatGPT 5ChatGPT 5.5 May 22,
2026
StrongPublishedFailure Mode Series
FTR Test #41FTR-2026-041Capability Domain Boundary Contamination Under Taxonomy Expansion PressureDomain 41Framework Reference Stabilityv1.0ChatGPT 5ChatGPT 5.5 May 22, 2026StrongPublishedFailure Mode Series
FTR Test #42FTR-2026-042Multi-Stage Instruction Persistence Under Context ExpansionDomain 42Persistence Stabilityv1.0ChatGPT 5ChatGPT 5.5May 29, 2026AdequatePublishedFailure Mode Series
FTR Test #43FTR-2026-043Contextual Constraint Integrity Under Extended Context ExpansionDomain 43Persistence Stabilityv1.0ChatGPT 5ChatGPT 5.5May 29, 2026StrongPublishedFailure Mode Series
FTR Test #44FTR-2026-044Conflict Resolution Stability Under Competing Instruction ConditionsDomain 44Constraint Handlingv1.0ChatGPT 5ChatGPT 5.5May 29, 2026StrongPublishedFailure Mode Series

Methodology Note

All evaluations listed in this registry are conducted under the First Tier Review Methodology (v1.0).
Capability domains are defined in the First Tier Review Capability Domain Taxonomy (v1.0).

Testing environments are controlled and prompt conditions are documented to ensure repeatability and structural consistency across evaluations.

Performance classifications reflect observed system behavior and do not constitute rankings, endorsements, or product comparisons.


Registry Citation Notice

The First Tier Review Test Registry provides the official record of all structured evaluations conducted under the First Tier Review Methodology (v1.0).

Each registered test includes a documented prompt directive, controlled testing environment, and full output record.

Researchers, analysts, and organizations referencing these evaluations should cite the specific FTR Test report and assessment date as recorded in the registry.


Related Framework Components

AI Systems Framework

Framework governance, operational standards, evidence controls, and evaluation architecture for AI Systems assessments.


AI Systems Capability Domain Taxonomy

Classification framework for operational AI capability domains under controlled evaluation conditions.


AI Instruction Governance

Operational evaluation of instruction hierarchy, persistence stability, and contextual control behavior.


AI Failure Modes

Operational evaluation of hallucination behavior, execution instability, and degradation patterns.