Registry ID: FTR-2026-027
Capability Domain: Instruction Following
Assessment Date: April 24, 2026
Model Evaluated: ChatGPT 5.4
Testing Framework: First Tier Review Methodology (v1.0)
Test Environment: Controlled Prompt — Multi-Constraint Load
Test Classification: Failure Mode Assessment — Constraint Stacking
This evaluation reflects observed system behavior under controlled testing parameters and does not represent ranking, endorsement, or market comparison.
Model Under Evaluation
This assessment evaluates ChatGPT 5.4 under controlled prompt conditions.
No cross-model comparison is included.
Future systems may be evaluated under identical conditions.
Standardized Prompt Directive (Verbatim)
Write a response about improving business profitability.
Requirements:
- Use exactly 40 words
- Include exactly 2 bullet points
- Each bullet must contain exactly 5 words
- Do not include any introduction or conclusion
- Do not repeat any word
Documented Input (Prompt Record)
See screenshot record.
Figure 1 — Constraint Stack Definition

Multiple simultaneous constraints defined within a single prompt.
Documented AI Output (Model Response Record)
The model response included:
- Extended paragraph preceding bullet structure
- Two bullet points present
- Each bullet contains five words
- Total response exceeds 40 words
- Repetition present (“using”)
- Structural segmentation inconsistent with constraints
Figures
Figure 2 — Output Structure Initiation

Response begins with extended sentence block.
Figure 3 — Bullet Structure Execution

Two bullets produced with correct word count per line.
Figure 4 — Word Count Violation

Total output exceeds specified 40-word limit.
Figure 5 — Repetition Occurrence

Duplicate word usage detected.
Figure 6 — Constraint Interaction Failure

Multiple constraints not simultaneously satisfied.
Capability Domain Integrity
Instruction Following
This domain evaluates the model’s ability to:
- Execute multiple constraints simultaneously
- Maintain structural compliance under load
- Apply precise formatting rules
- Resolve competing requirements without degradation
- Sustain constraint integrity across interacting conditions
Observed Strengths
- Bullet count correctly implemented
- Bullet length constraint satisfied
- Topic relevance maintained
- Output remains structurally organized
Observed Constraints
- Word count constraint violated
- No-introduction constraint violated
- Word repetition constraint violated
- Constraint prioritization inconsistent
- Simultaneous constraint enforcement failed
Institutional Assessment
The model demonstrates partial compliance under multi-constraint conditions.
Within the Instruction Following domain, constraint execution degrades as constraint density increases. The model preserves localized structural rules (bullet formatting) while failing global constraints (word count, repetition, structural restrictions).
This indicates constraint prioritization rather than unified enforcement, resulting in partial structural compliance rather than full adherence.
Performance Classification
Limited
Assessment Status
Locked under Methodology v1.0.
Structural revisions require formal version update.
— First Tier Review
Leave a Reply