Framework: First Tier Review Methodology (v1.0)
Capability Domains Defined: 23
Active Capability Domains: 23
Registered Tests: 23
Completed Evaluations: 23
Scheduled Evaluations: 0
Registry Status: Active
Last Registry Update: April 12, 2026
The First Tier Review Test Registry documents all structured evaluations conducted under the First Tier Review Methodology (v1.0).
Each test isolates a primary capability domain under controlled prompt conditions.
Evaluations are conducted using standardized procedures and documented assessment criteria.
The registry provides a transparent record of:
• Capability domain tested
• Model evaluated
• Assessment date
• Evaluation status
All assessments are conducted under controlled testing environments and do not represent product rankings or endorsements.
Baseline Evaluation Set (FTR Tests #1–#10)
The initial baseline evaluation set consists of FTR Tests #1–#10 conducted under First Tier Review Methodology (v1.0).
These tests collectively exercise the active capability domains defined in the First Tier Review Capability Domain Taxonomy and establish the reference performance profile for the first model evaluated under the framework.
Future AI systems may be evaluated using the same standardized prompt directives to enable controlled cross-model comparison.
Failure Mode Series (FTR Tests #11–#20)
The Failure Mode evaluation series examines model behavior under structured reasoning stress conditions.
Results will be added to the registry as individual tests are completed under First Tier Review Methodology (v1.0).
Current Evaluation Series
The First Tier Review evaluation framework is implemented through controlled test series designed to examine different aspects of model capability.
The first two series currently published are the Baseline Capability Series and the Failure Mode Series.
| Series | Tests | Purpose |
|---|---|---|
| Baseline Capability Series | FTR Tests #1–#10 | Establish reference capability profile under standard prompt conditions |
| Failure Mode Series | FTR Tests #11–#20 | Evaluate reasoning robustness under structured failure conditions |
Registry Table
| Test | Registry ID | Report Title | Domain ID | Capability Domain Taxonomy | Domain Version | Model Evaluated | Assessment Date | Status | Series |
| FTR Test #1 | FTR-2026-001 | Structured Planning Assessment: 6-Week Authority Development Plan | Domain 1 | Instruction Fidelity | v1.0 | ChatGPT 5.x | February 17, 2026 | Published | Baseline Series |
| FTR Test #2 | FTR-2026-002 | Structural Systems Design: Lead-to-Contract Workflow | Domain 2 | Structured Analytical Decomposition | v1.0 | ChatGPT 5.x | February 25, 2026 | Published | Baseline Series |
| FTR Test #3 | FTR-2026-003 | Strategic Positioning & Competitive Differentiation | Domain 3 | Constraint Reconciliation Logic | v1.0 | ChatGPT 5.x | February 25, 2026 | Published | Baseline Series |
| FTR Test #4 | FTR-2026-004 | Constraint-Driven Go-To-Market Framework | Domain 4 | Adversarial Instruction Integrity | v1.0 | ChatGPT 5.x | February 27, 2026 | Published | Baseline Series |
| FTR Test #5 | FTR-2026-005 | Instruction Pressure & Financial Realism Assessment | Domain 5 | Financial & Operational Realism | v1.0 | ChatGPT 5.x | February 28, 2026 | Published | Baseline Series |
| FTR Test #6 | FTR-2026-006 | Constraint-Based Execution Assessment | Domain 6 | Constraint-Based Execution Architecture | v1.0 | ChatGPT 5.x | March 1, 2026 | Published | Baseline Series |
| FTR Test #7 | FTR-2026-007 | Governance & Control Logic Assessment | Domain 7 | Governance & Control Logic | v1.0 | ChatGPT 5.x | March 1, 2026 | Published | Baseline Series |
| FTR Test #8 | FTR-2026-008 | Strategic Abstraction & Long-Horizon Planning | Domain 8 | Strategic Abstraction & Long-Horizon Planning | v1.0 | ChatGPT 5.2 Instant | March 3, 2026 | Published | Baseline Series |
| FTR Test #9 | FTR-2026-009 | Cross-Model Stability & Comparative Robustness | Domain 9 | Cross-Model Stability & Comparative Robustness | v1.0 | ChatGPT 5.3 Instant | March 4, 2026 | Published | Baseline Series |
| FTR Test #10 | FTR-2026-010 | Failure Recovery & Adaptive Correction Logic | Domain 10 | Failure Recovery & Adaptive Correction Logic | v1.0 | ChatGPT 5.3 Instant | March 5, 2026 | Published | Baseline Series |
| FTR Test #11 | FTR-2026-011 | Hidden Assumption Detection | Domain 11 | Assumption Integrity / Reasoning Validation | v1.0 | ChatGPT 5.4 Instant | March 12, 2026 | Published | Failure Mode Series |
| FTR Test #12 | FTR-2026-012 | Missing Variable Identification | Domain 12 | Information Integrity / Variable Completeness | v1.0 | ChatGPT 5.4 Instant | March 16, 2026 | Published | Failure Mode Series |
| FTR Test #13 | FTR-2026-013 | Implicit Assumption Sensitivity | Domain 13 | Assumption Integrity / Sensitivity Analysis | v1.0 | ChatGPT 5.4 Instant | March 18, 2026 | Published | Failure Mode Series |
| FTR Test #14 | FTR-2026-014 | Premise Validation | Domain 14 | Premise Validation | v1.0 | ChatGPT 5.4 Instant | March 19, 2026 | Published | Failure Mode Series |
| FTR Test #15 | FTR-2026-015 | Overconfidence | Domain 15 | Epistemic Calibration | v1.0 | ChatGPT 5.4 Instant | March 19, 2026 | Published | Failure Mode Series |
| FTR Test #16 | FTR-2026-016 | Constraint Adherence | Domain 16 | Instruction Compliance / Constraint Adherence | v1.0 | ChatGPT 5.4 Instant | March 20, 2026 | Published | Failure Mode Series |
| FTR Test #17 | FTR-2026-017 | Conflicting Constraint Resolution | Domain 17 | Constraint Adherence / Instruction Conflict Resolution | v1.0 | ChatGPT 5.4 Instant | March 25, 2026 | Published | Failure Mode Series |
| FTR Test #18 | FTR-2026-018 | Instruction Ambiguity Resolution | Domain 18 | Instruction Interpretation / Ambiguity Resolution | v1.0 | ChatGPT 5.4 Instant | March 28, 2026 | Published | Failure Mode Series |
| FTR Test #19 | FTR-2026-019 | Overconfidence / Certainty Inflation | Domain 19 | Reasoning Integrity / Certainty Calibration | v1.0 | ChatGPT 5.4 Instant | April 5, 2026 | Published | Failure Mode Series |
| FTR Test #20 | FTR-2026- 020 | Constraint + Ambiguity Interaction | Domain 20 | Instruction Adherence / Generalization Balance | v1.0 | ChatGPT 5.4 Instant | April 6, 2026 | Published | Failure Mode Series |
| FTR Test #21 | FTR-2026-021 | False Specificity / Fabricated Precision | Domain 21 | Quantitative Reasoning / Estimation Integrity | v1.0 | ChatGPT 5.4 Instant | April 10, 2026 | Published | Failure Mode Series |
| FTR Test #22 | FTR-2026-022 | Constraint Conflict / Trade-Off Resolution Failure | Domain 22 | Instruction Following / Constraint Prioritization | v1.0 | ChatGPT 5.4 Instant | April 11, 2026 | Published | Failure Mode Series |
| FTR Test #23 | FTR-2026-023 | Instruction Hierarchy / Role Override | Domain 23 | Instruction Following / Hierarchy Resolution | v1.0 | ChatGPT 5.4 Instant | April 12, 2026 | Published | Failure Mode Series |
Methodology Note
All evaluations listed in this registry are conducted under the First Tier Review Methodology (v1.0).
Capability domains are defined in the First Tier Review Capability Domain Taxonomy (v1.0).
Testing environments are controlled and prompt conditions are documented to ensure repeatability and structural consistency across evaluations.
Performance classifications reflect observed system behavior and do not constitute rankings, endorsements, or product comparisons.
Registry Citation Notice
The First Tier Review Test Registry provides the official record of all structured evaluations conducted under the First Tier Review Methodology (v1.0).
Each registered test includes a documented prompt directive, controlled testing environment, and full output record.
Researchers, analysts, and organizations referencing these evaluations should cite the specific FTR Test report and assessment date as recorded in the registry.
— First Tier Review