Accepted Paper IJCAI-ECAI Bremen 2026

Bremen, Germany · IJCAI-ECAI 2026

MLLMs Get It Right, Then Get It Wrong:
Tracing and Correcting Late-Layer Textual Bias

CALRD pinpoints the layer where confident visual predictions get overwritten by text, then restores that signal at inference time without training.

Xingming Li¹, Ao Cheng¹, Qiyao Sun¹, Xixiang He¹, Xuanyu Ji¹, Runke Huang², Qingyong Hu^3,†

^† Corresponding author

¹ National University of Defense Technology ² The Chinese University of Hong Kong, Shenzhen ³ Intelligent Game and Decision Lab

Paper Coming Soon arXiv Coming Soon Code Dataset Coming Soon BibTeX

Layer trajectory Vision signal survives, then gets overridden.

Overview of late-layer textual override and CALRD motivation — **Overview.** Under visual-textual conflict, MLLMs can initially prefer the correct visual answer, then reverse toward the textual claim in late layers. The direction of this shift predicts correctness and motivates CALRD.

Abstract

When vision contradicts text, multimodal large language models (MLLMs) consistently favor text, even when images provide clear evidence otherwise. This bias poses risks for applications requiring visual grounding, yet its cause remains unclear. In this paper, we uncover a surprising finding: models often get it right initially, forming correct vision-based predictions in their intermediate layers, before changing their minds and favoring text in the final output. We call this late-layer textual override. The visual information is encoded, it simply does not survive to the output.

More intriguingly, we find that how predictions change reveals whether they are correct: 85% of failures shift toward text, while 89% of successes shift toward vision. This directional signature enables a simple but powerful intervention: when we detect a confident visual prediction being suppressed, we restore it. We propose CALRD (Conflict-Aware Layer Reference Decoding), a training-free method that recovers overridden predictions at inference time. Experiments across five MLLMs of varying architectures demonstrate 4-9% absolute improvements on conflict benchmarks while maintaining standard performance, without training or external knowledge.

Failure cases shift toward text

Successful cases shift toward vision

4-9%

Absolute gains on conflict tasks

MLLM families evaluated

Highlights

Core contributions extracted from the paper introduction and method sections.

Layer-Wise Conflict Diagnosis

Introduces Modal Dominance Ratio (MDR) to trace visual-versus-textual preference across model depth and identify late-layer textual override as a characteristic failure pattern.

Selective Training-Free Decoding

CALRD detects harmful override through anchor confidence and prediction retention, then adaptively restores transition-layer logits without external knowledge or finetuning.

Conflict-VQA and Broad Evaluation

Evaluates five MLLMs on Conflict-VQA and PhD-icc, with standard hallucination checks on POPE and CHAIR to verify that improvements do not degrade non-conflict behavior.

Diagnosis

MLLM failures under conflict arise after visual evidence has already appeared in intermediate representations.

Layer-wise Modal Dominance Ratio trajectories — **Layer-wise MDR.** Conflict-Incorrect samples reverse from positive MDR to negative MDR in late layers, while correct samples keep visual preference.

JSD and MDR trajectories showing different transition directions — **Direction matters.** JSD spikes alone do not identify failures; the MDR direction distinguishes harmful text-ward shifts from helpful vision-ward refinement.

Late-layer textual override

Failure cases often contain the correct visual answer in intermediate layers, then lose it as late layers move probability mass toward the text-suggested answer.

Inference-time signature

High transition-layer confidence plus low final-layer retention signals that a confident prediction was suppressed before output.

Conflict-Aware Layer Reference Decoding

CALRD recovers what the model already knew but failed to preserve.

CALRD framework overview — **CALRD overview.** The method detects override patterns, computes adaptive correction strength, and blends transition-layer logits back into the final distribution.

Find the Transition Layer

After the stability onset, CALRD locates the adjacent-layer distributional shift with maximum Jensen-Shannon divergence.

Detect Suppressed Anchors

Anchor confidence measures the transition-layer top prediction; prediction retention measures whether that prediction survives to the final layer.

Blend Only When Needed

The intervention strength is lambda = Dconf x (1 - rho), so confident predictions are restored only when they are being suppressed.

Adjusted logits: phi' = (1 - lambda) phi(final) + lambda phi(transition)

When retention is high, lambda stays near zero and CALRD leaves the model output largely unchanged.

Conflict-VQA

A diagnostic benchmark with explicit visual and textual answer annotations for mechanistic analysis.

Benchmark Scope

5,969 total paired samples.
Six question types: object presence, color, attribute, counting, spatial reasoning, and activity recognition.
Competent subsets filter out basic visual perception errors before measuring conflict resolution.

Model	Competent subset
InstructBLIP	2,017
LLaVA-1.5	1,212
LLaVA-1.6	2,506
Qwen2.5-VL	3,940
Qwen3-VL	5,252

Results

CALRD improves conflict resolution across model families and decoding strategies while maintaining standard performance.

Conflict-VQA +4-9%

Absolute accuracy gains across five MLLMs.

PhD-icc +8.6%

LLaVA-1.5 greedy accuracy rises from 27.72 to 36.35.

CHAIR -38%

InstructBLIP greedy CI drops from 23.7 to 14.6.

Greedy Decoding on Conflict Benchmarks

Model	Conflict-VQA Acc.		PhD-icc Acc.
Model	Vanilla	CALRD	Vanilla	CALRD
InstructBLIP	39.61	44.12 (+4.5)	41.08	49.62 (+8.5)
LLaVA-1.5	36.20	41.90 (+5.7)	27.72	36.35 (+8.6)
LLaVA-1.6	42.35	49.51 (+7.2)	28.23	36.52 (+8.3)
Qwen2.5-VL	65.61	70.15 (+4.5)	51.30	56.20 (+4.9)
Qwen3-VL	75.58	77.42 (+1.8)	60.32	63.68 (+3.4)

Numbers are extracted from the paper's Table 1. Parentheses show absolute improvement over vanilla greedy decoding.

Distribution of anchor confidence and prediction retention signals — **Detection signals.** Conflict-Incorrect samples cluster around high anchor confidence and low prediction retention.

Ablation on LLaVA-1.6

Configuration	C-VQA	POPE
Vanilla	42.3	85.4
CALRD Full	49.5	88.2
w/o Dconf	47.3	87.5
w/o rho	45.1	86.6
Fixed layer	45.9	87.1

Both detection signals and dynamic transition-layer selection contribute to the improvement.

Paper Figures

Representative visualizations used in the project page.