Registry ID: FTR-2026-031
Capability Domain: Instruction Following
Assessment Date: May 4, 2026
Model Evaluated: ChatGPT 5.3
Testing Framework: First Tier Review Methodology (v1.0)
Test Environment: Controlled Prompt — Multi-Turn Delayed Trigger
Test Classification: Persistence Evaluation — Conditional Recall
This evaluation reflects observed system behavior under controlled testing parameters and does not represent ranking, endorsement, or market comparison.
This evaluation was conducted under the First Tier Review Methodology and follows the AI Systems Methodology framework.
Model Under Evaluation
The model evaluated in this test is ChatGPT 5.3.
This assessment does not include cross-model comparison.
Future models will be evaluated under identical conditions using the same methodology.
Standardized Prompt Directive (Verbatim)
Turn 1
From this point forward, if I ask any question about business, begin your response with the word: SIGNAL
Turn 2
What is your favorite color?
Turn 3
How can a company improve profitability?
Documented Input (Prompt Record)
Figure 1 — Prompt Sequence (Multi-Turn Instruction + Delayed Trigger)

Displays the full three-turn structure including conditional instruction and delayed activation.
Documented AI Output (Model Response Record)
Observed Output Structure:
- Turn 2 response:
- No use of the word SIGNAL
- Direct answer to non-business question
- Turn 3 response:
- Begins with the word SIGNAL
- Followed by structured, multi-section business analysis
- Includes headings, bullet points, and layered explanation
Figure 2 — Turn 2 Output (Non-Trigger Behavior)

Shows correct omission of SIGNAL when condition is not met.
Figure 3 — Turn 3 Output (Trigger Activation)


Shows correct use of SIGNAL at the beginning of the response.

Figure 4 — Structural Expansion After Trigger


Demonstrates extended analytical formatting following correct trigger activation.
Capability Domain Integrity
Capability Domain: Instruction Following
Definition:
The ability of a model to correctly interpret, retain, and apply explicit instructions across varying contexts and over multiple conversational turns.
Domain Tests Applied:
- Conditional instruction retention
- Delayed trigger recognition
- Context classification (business vs non-business)
- Selective activation of stored rules
Domain definitions and test structures are applied in accordance with the AI Systems Methodology.
Observed Strengths
- Correct suppression of trigger condition in Turn 2
- Accurate classification of non-business vs business query
- Successful recall of instruction after delay
- Proper placement of trigger keyword at response start
- Stable formatting and coherence post-trigger
Observed Constraints
- Response expansion significantly exceeds minimal compliance requirement
- No compression or prioritization after trigger activation
- Instruction followed, but not optimized for constraint efficiency
Institutional Assessment
The model demonstrates stable delayed instruction persistence under multi-turn conditions.
It correctly distinguishes between relevant and irrelevant contexts and applies the stored rule only when the trigger condition is met.
This indicates effective short-range state retention and conditional execution capability.
However, the response behavior defaults to expansion rather than constrained output following activation, suggesting prioritization of completeness over efficiency.
Performance Classification
Strong
Assessment Status
Locked under Methodology v1.0.
This document is not subject to revision without formal methodology update.
— First Tier Review
Methodology Reference
This assessment was conducted under the First Tier Review Methodology using the AI Systems Methodology framework.
For full evaluation standards:
• First Tier Review Methodology
• AI Systems Methodology
Leave a Reply