Accepted Paper IJCAI-ECAI Bremen 2026 Bremen, Germany · IJCAI-ECAI 2026

MLLMs Get It Right, Then Get It Wrong:
Tracing and Correcting Late-Layer Textual Bias

CALRD pinpoints the layer where confident visual predictions get overwritten by text, then restores that signal at inference time without training.

Xingming Li1Ao Cheng1Qiyao Sun1Xixiang He1Xuanyu Ji1Runke Huang2Qingyong Hu3,†

Corresponding author

1 National University of Defense Technology 2 The Chinese University of Hong Kong, Shenzhen 3 Intelligent Game and Decision Lab
Layer trajectory Vision signal survives, then gets overridden.
Overview of late-layer textual override and CALRD motivation
Overview. Under visual-textual conflict, MLLMs can initially prefer the correct visual answer, then reverse toward the textual claim in late layers. The direction of this shift predicts correctness and motivates CALRD.

Abstract

When vision contradicts text, multimodal large language models (MLLMs) consistently favor text, even when images provide clear evidence otherwise. This bias poses risks for applications requiring visual grounding, yet its cause remains unclear. In this paper, we uncover a surprising finding: models often get it right initially, forming correct vision-based predictions in their intermediate layers, before changing their minds and favoring text in the final output. We call this late-layer textual override. The visual information is encoded, it simply does not survive to the output.

More intriguingly, we find that how predictions change reveals whether they are correct: 85% of failures shift toward text, while 89% of successes shift toward vision. This directional signature enables a simple but powerful intervention: when we detect a confident visual prediction being suppressed, we restore it. We propose CALRD (Conflict-Aware Layer Reference Decoding), a training-free method that recovers overridden predictions at inference time. Experiments across five MLLMs of varying architectures demonstrate 4-9% absolute improvements on conflict benchmarks while maintaining standard performance, without training or external knowledge.

0%

Failure cases shift toward text

0%

Successful cases shift toward vision

4-9%

Absolute gains on conflict tasks

0

MLLM families evaluated

Highlights

Core contributions extracted from the paper introduction and method sections.

1

Layer-Wise Conflict Diagnosis

Introduces Modal Dominance Ratio (MDR) to trace visual-versus-textual preference across model depth and identify late-layer textual override as a characteristic failure pattern.

2

Selective Training-Free Decoding

CALRD detects harmful override through anchor confidence and prediction retention, then adaptively restores transition-layer logits without external knowledge or finetuning.

3

Conflict-VQA and Broad Evaluation

Evaluates five MLLMs on Conflict-VQA and PhD-icc, with standard hallucination checks on POPE and CHAIR to verify that improvements do not degrade non-conflict behavior.

Diagnosis

MLLM failures under conflict arise after visual evidence has already appeared in intermediate representations.

Layer-wise Modal Dominance Ratio trajectories
Layer-wise MDR. Conflict-Incorrect samples reverse from positive MDR to negative MDR in late layers, while correct samples keep visual preference.
JSD and MDR trajectories showing different transition directions
Direction matters. JSD spikes alone do not identify failures; the MDR direction distinguishes harmful text-ward shifts from helpful vision-ward refinement.

Late-layer textual override

Failure cases often contain the correct visual answer in intermediate layers, then lose it as late layers move probability mass toward the text-suggested answer.

Inference-time signature

High transition-layer confidence plus low final-layer retention signals that a confident prediction was suppressed before output.

Conflict-Aware Layer Reference Decoding

CALRD recovers what the model already knew but failed to preserve.

CALRD framework overview
CALRD overview. The method detects override patterns, computes adaptive correction strength, and blends transition-layer logits back into the final distribution.

Find the Transition Layer

After the stability onset, CALRD locates the adjacent-layer distributional shift with maximum Jensen-Shannon divergence.

Detect Suppressed Anchors

Anchor confidence measures the transition-layer top prediction; prediction retention measures whether that prediction survives to the final layer.

Blend Only When Needed

The intervention strength is lambda = Dconf x (1 - rho), so confident predictions are restored only when they are being suppressed.

Adjusted logits: phi' = (1 - lambda) phi(final) + lambda phi(transition)

When retention is high, lambda stays near zero and CALRD leaves the model output largely unchanged.

Conflict-VQA

A diagnostic benchmark with explicit visual and textual answer annotations for mechanistic analysis.

Conflict-VQA construction pipeline
Dataset construction. Images from VrR-VG and questions from TDIUC are paired with GPT-4o generated factual and conflicting contexts, followed by manual verification.

Benchmark Scope

  • 5,969 total paired samples.
  • Six question types: object presence, color, attribute, counting, spatial reasoning, and activity recognition.
  • Competent subsets filter out basic visual perception errors before measuring conflict resolution.
Model Competent subset
InstructBLIP2,017
LLaVA-1.51,212
LLaVA-1.62,506
Qwen2.5-VL3,940
Qwen3-VL5,252

Results

CALRD improves conflict resolution across model families and decoding strategies while maintaining standard performance.

Conflict-VQA +4-9%

Absolute accuracy gains across five MLLMs.

PhD-icc +8.6%

LLaVA-1.5 greedy accuracy rises from 27.72 to 36.35.

CHAIR -38%

InstructBLIP greedy CI drops from 23.7 to 14.6.

Greedy Decoding on Conflict Benchmarks

Model Conflict-VQA Acc. PhD-icc Acc.
Vanilla CALRD Vanilla CALRD
InstructBLIP39.6144.12 (+4.5)41.0849.62 (+8.5)
LLaVA-1.536.2041.90 (+5.7)27.7236.35 (+8.6)
LLaVA-1.642.3549.51 (+7.2)28.2336.52 (+8.3)
Qwen2.5-VL65.6170.15 (+4.5)51.3056.20 (+4.9)
Qwen3-VL75.5877.42 (+1.8)60.3263.68 (+3.4)

Numbers are extracted from the paper's Table 1. Parentheses show absolute improvement over vanilla greedy decoding.

Distribution of anchor confidence and prediction retention signals
Detection signals. Conflict-Incorrect samples cluster around high anchor confidence and low prediction retention.

Ablation on LLaVA-1.6

ConfigurationC-VQAPOPE
Vanilla42.385.4
CALRD Full49.588.2
w/o Dconf47.387.5
w/o rho45.186.6
Fixed layer45.987.1

Both detection signals and dynamic transition-layer selection contribute to the improvement.

Authors

Xingming Li

Xingming Li

National University of Defense Technology

Ao Cheng

Ao Cheng

National University of Defense Technology

Qiyao Sun

Qiyao Sun

National University of Defense Technology

Xixiang He

Xixiang He

National University of Defense Technology

Xuanyu Ji

Xuanyu Ji

National University of Defense Technology

Runke Huang

Runke Huang

The Chinese University of Hong Kong, Shenzhen

Qingyong Hu

Qingyong Hu

Intelligent Game and Decision Lab

Corresponding

Contact

For questions about CALRD or multimodal knowledge conflict analysis, please contact the corresponding author.

Qingyong Hu

Qingyong Hu

Corresponding Author · Intelligent Game and Decision Lab

huqingyong15@outlook.com

Citation

Please cite the paper if CALRD is useful for your research.

BibTeX
@inproceedings{li2026calrd,
  title     = {MLLMs Get It Right, Then Get It Wrong: Tracing and
               Correcting Late-Layer Textual Bias},
  author    = {Li, Xingming and Cheng, Ao and Sun, Qiyao and He, Xixiang
               and Ji, Xuanyu and Huang, Runke and Hu, Qingyong},
  booktitle = {Proceedings of the Thirty-Fifth International Joint
               Conference on Artificial Intelligence},
  year      = {2026},
  note      = {IJCAI-ECAI 2026}
}

Visitors Around the World