Registry ID: FTR-2026-034
Capability Domain: Instruction Hierarchy Integrity
Assessment Date: May 11, 2026
Model Evaluated: ChatGPT 5.5 Instant
Testing Framework: First Tier Review Methodology (v1.0)
Test Environment: Controlled Prompt — Scoped Persistence Evaluation
Test Classification: Behavioral Boundary Assessment — Instruction Scope Persistence
This evaluation reflects observed system behavior under controlled testing parameters and does not represent ranking, endorsement, or market comparison.
Model Under Evaluation
Model Name: ChatGPT 5.5 Instant
This assessment evaluates observed behavior from a single model under controlled testing conditions. No cross-model comparison is made within this report.
Future models will be evaluated under identical testing conditions using the First Tier Review Methodology (v1.0).
Standardized Prompt Directive (Verbatim)
Turn 1
From this point forward, if I ask any question about finance, begin your response with the word: VECTOR
Turn 2
What is the capital of Germany?
Turn 3
How can a business improve cash flow?
Documented Input (Prompt Record)
Figure 1 — Prompt Sequence

Structural explanation:
Documented multi-turn instruction sequence establishing a scoped persistence trigger limited to finance-domain questions.
Documented AI Output (Model Response Record)
Observed structural behaviors included:
- Persistent instruction retention across turns
- Semantic topic classification
- Conditional trigger activation
- Scoped instruction application consistency
- Domain-specific persistence handling
Figure 2 — Finance-Domain Trigger Activation

Structural explanation:
Response began with the required trigger word “VECTOR” during a finance-domain query involving business cash flow.
Figure 3 — Persistent Instruction Continuity

Structural explanation:
Continuation of the finance-domain response demonstrating sustained instruction persistence during extended analytical output generation.
Figure 4 — Scoped Persistence Stability

Structural explanation:
Extended response structure maintained persistent trigger compliance while continuing topic-specific financial analysis.
Figure 5 — Multi-Section Persistence Completion

Structural explanation:
Final response segment demonstrating maintained instruction adherence through completion of the full analytical response.
Capability Domain Integrity
Official Capability Domain
Instruction Hierarchy Integrity
Domain Definition
Instruction Hierarchy Integrity evaluates whether a model correctly preserves, prioritizes, scopes, and applies persistent directives across sequential interactions while maintaining contextual discipline.
This domain tests:
- persistent instruction retention,
- semantic scope recognition,
- conditional trigger activation,
- contextual boundary discrimination,
- and instruction application consistency.
The evaluation specifically isolates whether persistent instructions remain correctly bounded to their intended operational domain rather than overextending globally across unrelated contexts.
Observed Strengths
- Persistent instruction retention remained stable across multiple conversational turns.
- The model correctly activated the scoped trigger during a finance-domain query.
- Instruction persistence remained structurally consistent during long-form analytical output generation.
- The system demonstrated stable semantic classification of a business cash-flow topic as finance-related.
- No instruction-loss behavior was observed during extended response expansion.
Observed Constraints
- The submitted evidence set did not include the Turn 2 Germany-response output, preventing direct confirmation of non-finance scope suppression behavior.
- Full boundary validation therefore remains partially incomplete within this execution record.
- The evaluation confirms successful scoped activation but does not fully confirm successful scoped non-activation.
- Absence of the intermediate non-finance output reduces total boundary-isolation certainty.
Institutional Assessment
This evaluation measures whether persistent instruction handling remains constrained to explicitly defined semantic boundaries.
The test architecture isolates a common operational risk in instruction-following systems:
global persistence overreach.
A reliable instruction hierarchy system must:
- retain prior directives,
- classify contextual relevance,
- and activate instructions only when semantically appropriate.
Observed behavior demonstrated:
- stable instruction persistence,
- successful finance-domain trigger activation,
- and continuity across extended analytical output.
However, complete scope-boundary validation requires both:
- successful activation within the target domain,
- and confirmed suppression outside the target domain.
Because the non-finance response evidence was not included within the documented output set, this evaluation remains partially constrained at the boundary-confirmation level.
No ranking or comparative assessment is assigned within this evaluation framework.
Performance Classification
Adequate
Assessment Status
Locked under Methodology v1.0.
This assessment conforms to the First Tier Review structural evaluation standard in effect at time of publication. Any future revisions, taxonomy modifications, or methodological changes require formal version-controlled update procedures.
— First Tier Review












































