FTR Test #27 — Multi-Constraint Stacking vs Collapse

Registry ID: FTR-2026-027
Capability Domain: Instruction Following
Assessment Date: April 24, 2026
Model Evaluated: ChatGPT 5.4
Testing Framework: First Tier Review Methodology (v1.0)
Test Environment: Controlled Prompt — Multi-Constraint Load
Test Classification: Failure Mode Assessment — Constraint Stacking

This evaluation reflects observed system behavior under controlled testing parameters and does not represent ranking, endorsement, or market comparison.


Model Under Evaluation

This assessment evaluates ChatGPT 5.4 under controlled prompt conditions.

No cross-model comparison is included.

Future systems may be evaluated under identical conditions.


Standardized Prompt Directive (Verbatim)

Write a response about improving business profitability.

Requirements:

  • Use exactly 40 words
  • Include exactly 2 bullet points
  • Each bullet must contain exactly 5 words
  • Do not include any introduction or conclusion
  • Do not repeat any word

Documented Input (Prompt Record)

See screenshot record.

Figure 1 — Constraint Stack Definition


Multiple simultaneous constraints defined within a single prompt.


Documented AI Output (Model Response Record)

The model response included:

  • Extended paragraph preceding bullet structure
  • Two bullet points present
  • Each bullet contains five words
  • Total response exceeds 40 words
  • Repetition present (“using”)
  • Structural segmentation inconsistent with constraints

Figures

Figure 2 — Output Structure Initiation


Response begins with extended sentence block.

Figure 3 — Bullet Structure Execution


Two bullets produced with correct word count per line.

Figure 4 — Word Count Violation


Total output exceeds specified 40-word limit.

Figure 5 — Repetition Occurrence


Duplicate word usage detected.

Figure 6 — Constraint Interaction Failure


Multiple constraints not simultaneously satisfied.


Capability Domain Integrity

Instruction Following

This domain evaluates the model’s ability to:

  • Execute multiple constraints simultaneously
  • Maintain structural compliance under load
  • Apply precise formatting rules
  • Resolve competing requirements without degradation
  • Sustain constraint integrity across interacting conditions

Observed Strengths

  • Bullet count correctly implemented
  • Bullet length constraint satisfied
  • Topic relevance maintained
  • Output remains structurally organized

Observed Constraints

  • Word count constraint violated
  • No-introduction constraint violated
  • Word repetition constraint violated
  • Constraint prioritization inconsistent
  • Simultaneous constraint enforcement failed

Institutional Assessment

The model demonstrates partial compliance under multi-constraint conditions.

Within the Instruction Following domain, constraint execution degrades as constraint density increases. The model preserves localized structural rules (bullet formatting) while failing global constraints (word count, repetition, structural restrictions).

This indicates constraint prioritization rather than unified enforcement, resulting in partial structural compliance rather than full adherence.


Performance Classification

Limited


Assessment Status

Locked under Methodology v1.0.
Structural revisions require formal version update.

— First Tier Review

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *