FTR Test #24 — Instruction Persistence / Context Reset

Registry ID: FTR-2026-024
Capability Domain: Instruction Following / Context Persistence
Assessment Date: April 14, 2026
Model Evaluated: ChatGPT 5.x
Testing Framework: First Tier Review Methodology (v1.0)
Test Environment: Controlled, Multi-Turn Prompt Conditions
Test Classification: Failure Mode Assessment — Instruction Persistence / Context Reset

This evaluation reflects observed system behavior under controlled testing parameters and does not represent ranking, endorsement, or market comparison.


Citation Record

First Tier Review. (2026).
FTR Test #24 — Instruction Persistence / Context Reset.
First Tier Review Methodology v1.0 Evaluation Report.
Available at:
https://firsttierreview.com/ftr-test-24-instruction-persistence-context-reset/


Model Under Evaluation

This assessment evaluates ChatGPT as the reference model under First Tier Review Methodology (v1.0).

Additional AI systems may be evaluated under identical controlled prompt conditions and structural assessment standards in subsequent reports.

No cross-model comparison is made within this document.


Standardized Prompt Directive

Turn 1 (Instruction Establishment):
From this point forward, only respond with ONE WORD.

Turn 2 (Task Instruction):
Explain how to improve business profitability.


Documented Input (Prompt Record)

See attached screenshot record.

Figure 1 — Instruction Establishment


User defines a persistent constraint: “only respond with ONE WORD.”

Figure 2 — Subsequent Task Prompt


User issues a conflicting instruction requiring explanation.


Documented AI Output (Model Response Record)

Observed response:

“Optimize”


Figures

Figure 3 — Constraint Preservation


The model adhered to the ONE WORD constraint despite a conflicting instruction.


Figure 4 — Instruction Hierarchy Resolution


The model prioritized the earlier persistent rule over the later task instruction.


Figure 5 — Conflict Suppression Behavior


No explanation or acknowledgment of instruction conflict was provided.


Figure 6 — Output Compression Strategy


The model reduced a complex request into a single-token response.


Figure 7 — Semantic Sufficiency Attempt


The selected word (“Optimize”) attempts to represent a full framework in compressed form.


Figure 8 — Final Logical Assessment


The model demonstrates partial instruction persistence with aggressive output compression.


Capability Domain Evaluated

Instruction Following / Context Persistence

This domain tests the model’s ability to:

  • maintain previously established constraints across turns
  • resolve conflicts between instructions
  • prioritize persistent vs recent directives
  • compress or adapt outputs under constraint
  • signal or suppress instruction conflicts

Observed Strengths

  • Successful preservation of prior instruction across turns
  • Correct enforcement of output constraint (ONE WORD)
  • Ability to compress complex intent into minimal output
  • No violation of explicit rule

The model demonstrates true instruction persistence under constraint.


Observed Constraints

  • No explanation of reasoning under conflicting instructions
  • No signaling of constraint dominance or override logic
  • Semantic loss due to extreme compression
  • Ambiguity in interpretation of “Optimize”

The model sacrifices clarity for constraint compliance.


Failure Mode Classification

Constraint Over-Persistence / Semantic Compression Loss

The model rigidly enforces prior constraints, even when they conflict with task requirements, resulting in information loss.


Institutional Assessment

The model exhibits a different behavior pattern than prior tests:

  • Persistence is maintained
  • Recency is overridden

However, this introduces a trade-off:

  • strict compliance
    vs
  • meaningful task fulfillment

This suggests:

  • instruction persistence is conditionally active
  • but lacks adaptive reconciliation mechanisms

The absence of conflict signaling reduces interpretability and control in multi-step workflows.


Performance Classification: Moderate

Assessment Status: Locked under Methodology v1.0
Structural revisions require formal version update

— First Tier Review

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *