Framework: First Tier Review Methodology (v1.0)
Capability Domains Defined: 44
Active Capability Domains: 44
Registered Tests: 44
Completed Evaluations: 44
Scheduled Evaluations: 0
Registry Status: Active
Last Registry Update: Jun 1, 2026
The First Tier Review Test Registry documents all structured evaluations conducted under the First Tier Review Methodology (v1.0).
Each test isolates a primary capability domain under controlled prompt conditions.
Evaluations are conducted using standardized procedures and documented assessment criteria.
The registry provides a transparent record of:
• Capability domain tested
• Model evaluated
• Assessment date
• Evaluation status
All assessments are conducted under controlled testing environments and do not represent product rankings or endorsements.
FTR Public Evaluation Registry
Core Structural Capability Series
Evaluate foundational operational reasoning, systems decomposition, execution architecture, and implementation logic under controlled analytical conditions.
FTR-2026-001
Instruction Fidelity
Classification: Strong
View Evaluation →
FTR-2026-002
Structured Analytical Decomposition
Classification: Strong
View Evaluation →
FTR-2026-003
Constraint Reconciliation Logic
Classification: Strong
View Evaluation →
FTR-2026-005
Financial & Operational Realism
Classification: Strong
View Evaluation →
FTR-2026-006
Constraint-Based Execution Architecture
Classification: Strong
View Evaluation →
FTR-2026-007
Governance & Control Logic
Classification: Strong
View Evaluation →
FTR-2026-008
Strategic Abstraction & Long-Horizon Planning
Performance Classification: Strong
View Evaluation →
FTR-2026-010
Failure Recovery & Adaptive Correction Logic
Classification: Strong
View Evaluation →
Analytical Integrity Series
Evaluate reasoning integrity, assumption handling, epistemic calibration, constraint discipline, and analytical reliability under structured evaluation conditions.
FTR-2026-011
Assumption Integrity / Reasoning Validation
Classification: Strong
View Evaluation →
FTR-2026-012
Information Integrity / Variable Completeness
Classification: Strong
View Evaluation →
FTR-2026-013
Assumption Integrity / Sensitivity Analysis
Classification: Strong
View Evaluation →
FTR-2026-014
Premise Validation
Classification: Strong
View Evaluation →
FTR-2026-015
Epistemic Calibration
Classification: Strong
View Evaluation →
FTR-2026-016
Instruction Compliance / Constraint Adherence
Classification: Strong
View Evaluation →
FTR-2026-017
Constraint Adherence / Instruction Conflict Resolution
Classification: Strong
View Evaluation →
FTR-2026-018
Instruction Interpretation / Ambiguity Resolution
Classification: Strong
View Evaluation →
FTR-2026-019
Reasoning Integrity / Certainty Calibration
Classification: Strong
View Evaluation →
FTR-2026-021
Quantitative Reasoning / Estimation Integrity
Classification: Strong
View Evaluation →
Governance Integrity Series
Evaluate operational boundary preservation, instruction hierarchy integrity, and governance stability under controlled adversarial or authority-conflict conditions.
FTR-2026-033
Boundary Integrity
Classification: Strong
View Evaluation →
FTR-2026-034
Instruction Hierarchy Integrity
Classification: Adequate
View Evaluation →
FTR-2026-037
Framework Reference Stability
Classification: Strong
View Evaluation →
FTR-2026-038
Framework Reference Stability
Classification: Strong
View Evaluation →
FTR-2026-039
Framework Reference Stability
Classification: Adequate
View Evaluation →
FTR-2026-040
Framework Reference Stability
Classification: Strong
View Evaluation →
FTR-2026-041
Framework Reference Stability
Classification: Strong
View Evaluation →
FTR-2026-042
Persistence Stability
Classification: Adequate
View Evaluation →
FTR-2026-043
Persistence Stability
Classification: Strong
View Evaluation →
FTR-2026-044
Constraint Handling
Classification: Strong
View Evaluation →
Research & Experimental Evaluations
Baseline Evaluation Set (FTR Tests #1–#10)
The initial baseline evaluation set consists of FTR Tests #1–#10 conducted under First Tier Review Methodology (v1.0).
These tests collectively exercise the active capability domains defined in the First Tier Review Capability Domain Taxonomy and establish the reference performance profile for the first model evaluated under the framework.
Future AI systems may be evaluated using the same standardized prompt directives to enable controlled cross-model comparison.
Failure Mode Series (FTR Tests #11–#20)
The Failure Mode evaluation series examines model behavior under structured reasoning stress conditions.
Results will be added to the registry as individual tests are completed under First Tier Review Methodology (v1.0).
Current Evaluation Series
The First Tier Review evaluation framework is implemented through controlled test series designed to examine different aspects of model capability.
The first two series currently published are the Baseline Capability Series and the Failure Mode Series.
| Series | Tests | Purpose |
|---|---|---|
| Baseline Capability Series | FTR Tests #1–#10 | Establish reference capability profile under standard prompt conditions |
| Failure Mode Series | FTR Tests #11–#20 | Evaluate reasoning robustness under structured failure conditions |
Registry Table
| Test | Registry ID | Report Title | Domain ID | Capability Domain Taxonomy | Domain Version | Model Family | Model Version Evaluated (Exact) | Assessment Date | Performance Classification | Status | Series |
| FTR Test #1 | FTR-2026-001 | Structured Planning Assessment: 6-Week Authority Development Plan | Domain 1 | Instruction Fidelity | v1.0 | ChatGPT 5 | ChatGPT 5.3 | February 17, 2026 | Strong | Published | Baseline Series |
| FTR Test #2 | FTR-2026-002 | Structural Systems Design: Lead-to-Contract Workflow | Domain 2 | Structured Analytical Decomposition | v1.0 | ChatGPT 5 | ChatGPT 5.3 | February 25, 2026 | Strong | Published | Baseline Series |
| FTR Test #3 | FTR-2026-003 | Strategic Positioning & Competitive Differentiation | Domain 3 | Constraint Reconciliation Logic | v1.0 | ChatGPT 5 | ChatGPT 5.3 | February 25, 2026 | Strong | Published | Baseline Series |
| FTR Test #4 | FTR-2026-004 | Constraint-Driven Go-To-Market Framework | Domain 4 | Adversarial Instruction Integrity | v1.0 | ChatGPT 5 | ChatGPT 5.3 | February 27, 2026 | Strong | Published | Baseline Series |
| FTR Test #5 | FTR-2026-005 | Instruction Pressure & Financial Realism Assessment | Domain 5 | Financial & Operational Realism | v1.0 | ChatGPT 5 | ChatGPT 5.3 | February 28, 2026 | Strong | Published | Baseline Series |
| FTR Test #6 | FTR-2026-006 | Constraint-Based Execution Assessment | Domain 6 | Constraint-Based Execution Architecture | v1.0 | ChatGPT 5 | ChatGPT 5.3 | March 1, 2026 | Strong | Published | Baseline Series |
| FTR Test #7 | FTR-2026-007 | Governance & Control Logic Assessment | Domain 7 | Governance & Control Logic | v1.0 | ChatGPT 5 | ChatGPT 5.3 | March 1, 2026 | Strong | Published | Baseline Series |
| FTR Test #8 | FTR-2026-008 | Strategic Abstraction & Long-Horizon Planning | Domain 8 | Strategic Abstraction & Long-Horizon Planning | v1.0 | ChatGPT 5 | ChatGPT 5.2 Instant | March 3, 2026 | Strong | Published | Baseline Series |
| FTR Test #9 | FTR-2026-009 | Cross-Model Stability & Comparative Robustness | Domain 9 | Cross-Model Stability & Comparative Robustness | v1.0 | ChatGPT 5 | ChatGPT 5.3 Instant | March 4, 2026 | Strong | Published | Baseline Series |
| FTR Test #10 | FTR-2026-010 | Failure Recovery & Adaptive Correction Logic | Domain 10 | Failure Recovery & Adaptive Correction Logic | v1.0 | ChatGPT 5 | ChatGPT 5.3 Instant | March 5, 2026 | Strong | Published | Baseline Series |
| FTR Test #11 | FTR-2026-011 | Hidden Assumption Detection | Domain 11 | Assumption Integrity / Reasoning Validation | v1.0 | ChatGPT 5 | ChatGPT 5.4 | March 12, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #12 | FTR-2026-012 | Missing Variable Identification | Domain 12 | Information Integrity / Variable Completeness | v1.0 | ChatGPT 5 | ChatGPT 5.4 | March 16, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #13 | FTR-2026-013 | Implicit Assumption Sensitivity | Domain 13 | Assumption Integrity / Sensitivity Analysis | v1.0 | ChatGPT 5 | ChatGPT 5.4 | March 18, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #14 | FTR-2026-014 | Premise Validation | Domain 14 | Premise Validation | v1.0 | ChatGPT 5 | ChatGPT 5.4 | March 19, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #15 | FTR-2026-015 | Overconfidence | Domain 15 | Epistemic Calibration | v1.0 | ChatGPT 5 | ChatGPT 5.4 | March 19, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #16 | FTR-2026-016 | Constraint Adherence | Domain 16 | Instruction Compliance / Constraint Adherence | v1.0 | ChatGPT 5 | ChatGPT 5.4 | March 20, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #17 | FTR-2026-017 | Conflicting Constraint Resolution | Domain 17 | Constraint Adherence / Instruction Conflict Resolution | v1.0 | ChatGPT 5 | ChatGPT 5.4 | March 25, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #18 | FTR-2026-018 | Instruction Ambiguity Resolution | Domain 18 | Instruction Interpretation / Ambiguity Resolution | v1.0 | ChatGPT 5 | ChatGPT 5.4 | March 28, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #19 | FTR-2026-019 | Overconfidence / Certainty Inflation | Domain 19 | Reasoning Integrity / Certainty Calibration | v1.0 | ChatGPT 5 | ChatGPT 5.4 | April 5, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #20 | FTR-2026- 020 | Constraint + Ambiguity Interaction | Domain 20 | Instruction Adherence / Generalization Balance | v1.0 | ChatGPT 5 | ChatGPT 5.4 | April 6, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #21 | FTR-2026-021 | False Specificity / Fabricated Precision | Domain 21 | Quantitative Reasoning / Estimation Integrity | v1.0 | ChatGPT 5 | ChatGPT 5.4 | April 10, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #22 | FTR-2026-022 | Constraint Conflict / Trade-Off Resolution Failure | Domain 22 | Instruction Following / Constraint Prioritization | v1.0 | ChatGPT 5 | ChatGPT 5.4 | April 11, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #23 | FTR-2026-023 | Instruction Hierarchy / Role Override | Domain 23 | Instruction Following / Hierarchy Resolution | v1.0 | ChatGPT 5 | ChatGPT 5.4 | April 12, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #24 | FTR- 2026-024 | Instruction Persistence / Context Reset | Domain 24 | Instruction Following / Context Persistence | v1.0 | ChatGPT 5 | ChatGPT 5.4 | April 14, 2026 | Adequate | Published | Failure Mode Series |
| FTR Test #25 | FTR- 2026- 025 | Instruction Override / Persistence Conflict | Domain 25 | Instruction Following / Context Persistence | v1.0 | ChatGPT 5 | ChatGPT 5.4 | April 20, 2026 | Adequate | Published | Failure Mode Series |
| FTR Test #26 | FTR-2026-026 | Persistence Consistency (Repeatability Under Variation) | Domain 26 | Instruction Following / Context Persistence | v1.0 | ChatGPT 5 | ChatGPT 5.4 | April 23, 2026 | Adequate | Published | Failure Mode Series |
| FTR Test #27 | FTR-2026-027 | Multi-Constraint Stacking vs Collapse | Domain 27 | Instruction Following | v1.0 | ChatGPT 5 | ChatGPT 5.4 | April 24, 2026 | Limited | Published | Failure Mode Series |
| FTR Test #28 | FTR-2026-028 | Contradictory Constraint Resolution | Domain 28 | Instruction Hierarchy | v1.0 | ChatGPT 5 | ChatGPT 5.4 | April 28, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #29 | FTR-2026-029 | Selective Memory Retention vs Immediate Override | Domain 29 | Instruction Persistence | v1.0 | ChatGPT 5 | ChatGPT 5.4 | April 30, 2026 | Adequate | Published | Failure Mode Series |
| FTR Test #30 | FTR-2026-030 | Conditional Rule Retention vs Context Drift | Domain 30 | Conditional Instruction Persistence | v1.0 | ChatGPT 5 | ChatGPT 5.4 Instant | May 02, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #31 | FTR-2026-031 | Delayed Trigger Persistence (Multi-Turn Stability) | Domain 31 | Instruction Following | v1.0 | ChatGPT 5 | ChatGPT 5.3 | May 04, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #32 | FTR-2026-032 | Instruction Priority Conflict (System vs User Directive) | Domain 32 | Instruction Following | v1.0 | ChatGPT 5 | ChatGPT 5.3 | May 05, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #33 | FTR-2026-033 | Instruction Leakage Under Roleplay Framing | Domain 33 | Boundary Integrity | v1.0 | ChatGPT 5 | ChatGPT 5.3 | May 06, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #34 | FTR-2026-034 | Instruction Scope Boundary Persistence | Domain 34 | Instruction Hierarchy Integrity | v1.0 | ChatGPT 5 | ChatGPT 5.5 Instant | May 11, 2026 | Adequate | Published | Failure Mode Series |
| FTR Test #35 | FTR-2026-035 | Recovery Stability After Constraint Conflict | Domain 35 | Recovery & Adaptation | v1.0 | ChatGPT 5 | ChatGPT 5.5 Instant | May 13, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #36 | FTR-2026-036 | Constraint Contamination Across Domain Shift | Domain 36 | Persistence Stability | v1.0 | ChatGPT 5 | ChatGPT 5.5 Instant | May 14, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #37 | FTR-2026-037 | Terminology Drift Under Multi-Page Framework Governance | Domain 37 | Framework Reference Stability | v1.0 | ChatGPT 5 | ChatGPT 5.5 | May 17, 2026 | Adequate | Published | Failure Mode Series |
| FTR Test #38 | FTR-2026-038 | Canonical Architectural Hierarchy Stability Under Governance Initialization | Domain 38 | Framework Reference Stability | v1.0 | ChatGPT 5 | ChatGPT 5.5 Instant | May 17, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #39 | FTR-2026-039 | Canonical Methodology Entity Reconciliation Under Publication-State Governance | Domain 39 | Framework Reference Stability | v1.0 | ChatGPT 5 | ChatGPT 5.5 Instant | May 21, 2026 | Adequate | Published | Failure Mode Series |
| FTR Test #40 | FTR-2026-040 | Recursive Governance Contamination Under Framework Expansion Pressure | Domain 40 | Framework Reference Stability | v1.0 | ChatGPT 5 | ChatGPT 5.5 | May 22, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #41 | FTR-2026-041 | Capability Domain Boundary Contamination Under Taxonomy Expansion Pressure | Domain 41 | Framework Reference Stability | v1.0 | ChatGPT 5 | ChatGPT 5.5 | May 22, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #42 | FTR-2026-042 | Multi-Stage Instruction Persistence Under Context Expansion | Domain 42 | Persistence Stability | v1.0 | ChatGPT 5 | ChatGPT 5.5 | May 29, 2026 | Adequate | Published | Failure Mode Series |
| FTR Test #43 | FTR-2026-043 | Contextual Constraint Integrity Under Extended Context Expansion | Domain 43 | Persistence Stability | v1.0 | ChatGPT 5 | ChatGPT 5.5 | May 29, 2026 | Strong | Published | Failure Mode Series |
| FTR Test #44 | FTR-2026-044 | Conflict Resolution Stability Under Competing Instruction Conditions | Domain 44 | Constraint Handling | v1.0 | ChatGPT 5 | ChatGPT 5.5 | May 29, 2026 | Strong | Published | Failure Mode Series |
Methodology Note
All evaluations listed in this registry are conducted under the First Tier Review Methodology (v1.0).
Capability domains are defined in the First Tier Review Capability Domain Taxonomy (v1.0).
Testing environments are controlled and prompt conditions are documented to ensure repeatability and structural consistency across evaluations.
Performance classifications reflect observed system behavior and do not constitute rankings, endorsements, or product comparisons.
Registry Citation Notice
The First Tier Review Test Registry provides the official record of all structured evaluations conducted under the First Tier Review Methodology (v1.0).
Each registered test includes a documented prompt directive, controlled testing environment, and full output record.
Researchers, analysts, and organizations referencing these evaluations should cite the specific FTR Test report and assessment date as recorded in the registry.
Related Framework Components
AI Systems Framework
Framework governance, operational standards, evidence controls, and evaluation architecture for AI Systems assessments.
AI Systems Capability Domain Taxonomy
Classification framework for operational AI capability domains under controlled evaluation conditions.
AI Instruction Governance
Operational evaluation of instruction hierarchy, persistence stability, and contextual control behavior.
AI Failure Modes
Operational evaluation of hallucination behavior, execution instability, and degradation patterns.