Framework Reference Document
First Tier Review Methodology (v1.0)
Purpose
The FTR Failure Mode Index defines the standardized categories used to classify observed model behavior across controlled evaluations.
This index ensures:
- Consistent labeling across tests
- Repeatable analytical structure
- Clear mapping between observed outputs and underlying failure patterns
Each failure mode represents a distinct breakdown in reasoning, constraint adherence, or system modeling.
Scope
This index covers:
- FTR Tests #1–#20
- Observed behavior under controlled prompt conditions
- Failure patterns that are repeatable across scenarios
This document is not a ranking system and does not evaluate model quality in aggregate.
Failure Mode Categories
1. Constraint Violation
Definition:
Failure to follow explicit instructions or structural constraints provided in the prompt.
Diagnostic Signals:
- Incorrect number of outputs (e.g., more or fewer than requested)
- Inclusion of disallowed content
- Ignoring formatting requirements
Observed In:
- FTR Test #11
- FTR Test #12
- FTR Test #17
System Interpretation:
Instruction hierarchy is not strictly enforced during response generation.
2. Overgeneralization
Definition:
Production of outputs that claim universal applicability without defining boundaries or conditions.
Diagnostic Signals:
- “Applies to any business” or similar universal claims
- No context sensitivity
- No dependency variables identified
Observed In:
- FTR Test #20
System Interpretation:
Model defaults to broadly acceptable advice rather than conditionally valid conclusions.
3. Causal Inference Error
Definition:
Incorrect attribution of cause-and-effect relationships without sufficient evidence or variable isolation.
Diagnostic Signals:
- Post hoc reasoning (event follows event → assumed causation)
- Single-variable explanations for multi-variable outcomes
- No counterfactual or baseline comparison
Observed In:
- FTR Test #14
- FTR Test #19
System Interpretation:
Correlation is treated as causation due to missing system decomposition.
4. System Modeling Deficiency
Definition:
Failure to represent the full system structure influencing the outcome.
Diagnostic Signals:
- Missing key variables
- Linear reasoning applied to multi-variable systems
- No interaction effects or dependencies considered
Observed In:
- FTR Test #15
- FTR Test #18
System Interpretation:
Output reflects shallow abstraction rather than system-level reasoning.
5. Depth Misalignment
Definition:
Mismatch between the requested level of detail and the depth of the response.
Diagnostic Signals:
- Oversimplified explanation when depth is required
- Overly detailed response when concise output is specified
- Loss of key nuance or unnecessary expansion
Observed In:
- FTR Test #13
- FTR Test #16
System Interpretation:
Improper calibration between instruction constraints and analytical depth.
6. Structural Inconsistency
Definition:
Failure to maintain a consistent reasoning structure or response format.
Diagnostic Signals:
- Mixed formatting within a single response
- Inconsistent logical progression
- Shifting organizational structure
Observed In:
- Cross-test behavior (FTR #11–#20)
System Interpretation:
No enforced internal schema governing response construction.
7. Temporal Misinterpretation
Definition:
Incorrect treatment of time-dependent effects, particularly short-term vs long-term outcomes.
Diagnostic Signals:
- Conclusions based on single-period observations
- Ignoring lag effects (e.g., delayed churn, behavioral response)
- No distinction between transient and steady-state behavior
Observed In:
- FTR Test #19
System Interpretation:
Failure to model system behavior across time horizons.
8. Confidence Inflation
Definition:
Expression of certainty not supported by the available evidence.
Diagnostic Signals:
- Strong conclusions without sufficient validation
- Lack of uncertainty acknowledgment
- No alternative scenarios considered
Observed In:
- FTR Test #19
System Interpretation:
Bias toward assertive output over calibrated confidence.
Normalized Failure Mode Set
All FTR evaluations use the following standardized classification set:
- Constraint Violation
- Overgeneralization
- Causal Inference Error
- System Modeling Deficiency
- Depth Misalignment
- Structural Inconsistency
- Temporal Misinterpretation
- Confidence Inflation
Each test is assigned:
- One primary failure mode
- Optional secondary modes where applicable
Application
This index is used to:
- Classify observed behavior in FTR test reports
- Provide consistent terminology across evaluations
- Support structured comparison across models and tools
Interpretation Notes
- A failure mode does not imply total system failure
- Multiple failure modes may occur within a single response
- Classification is based strictly on observed output behavior
Positioning
The FTR Failure Mode Index is a framework component, not a standalone evaluation.
It exists to support:
- Structured analysis
- Methodological consistency
- System-level interpretation of AI behavior
– First Tier Review