CVPR 2026

ENC-Bench: A Benchmark for Evaluating Multimodal Large Language Models in Electronic Navigational Chart Understanding

Corresponding author

1 National University of Defense Technology 2 Intelligent Game and Decision Lab 3 The Chinese University of Hong Kong
CVPR 2026 · Denver, Colorado
CVPR 2026 — Denver, Colorado
Teaser — Overview of ENC-Bench benchmark structure, tasks, and evaluated models
Overview of ENC-Bench. Our benchmark encompasses 840 authentic NOAA Electronic Navigational Charts and 20,490 expert-validated QA samples spanning a three-level task hierarchy: Perception, Spatial Reasoning, and Maritime Decision-Making. We evaluate 10 state-of-the-art MLLMs under a unified zero-shot protocol.

Abstract

Electronic Navigational Charts (ENCs) are the safety-critical backbone of modern maritime navigation, yet it remains unclear whether multimodal large language models (MLLMs) can reliably interpret them. Unlike natural images or conventional charts, ENCs encode regulations, bathymetry, and route constraints via standardized vector symbols, scale-dependent rendering, and precise geometric structure — requiring specialized maritime expertise for interpretation.

We introduce ENC-Bench, the first benchmark dedicated to professional ENC understanding. ENC-Bench contains 20,490 expert-validated samples from 840 authentic NOAA ENCs, organized into a three-level hierarchy: Perception (symbol and feature recognition), Spatial Reasoning (coordinate localization, bearing, distance), and Maritime Decision-Making (route legality, safety assessment, emergency planning under multiple constraints). All samples are generated from raw S-57 data through a calibrated vector-to-image pipeline with automated consistency checks and expert review.

We evaluate 10 state-of-the-art MLLMs such as GPT-4o, Gemini 2.5, Qwen3-VL, InternVL3, and GLM-4.5V under a unified zero-shot protocol. The best model achieves only 47.88% accuracy, with systematic challenges in symbolic grounding, spatial computation, multi-constraint reasoning, and robustness to lighting and scale variations. By establishing the first rigorous ENC benchmark, we open a new research frontier at the intersection of specialized symbolic reasoning and safety-critical AI.

Dataset Highlights

0 Samples
0 NOAA Charts
0 Task Types
0 MLLMs Evaluated

Data Generation Pipeline

840 authentic NOAA charts are processed through a calibrated S-57 vector-to-image pipeline with automated consistency checks and expert validation.

ENC-Bench data generation pipeline from raw S-57 NOAA charts to expert-validated QA pairs

ENC vs. Conventional Maps

Electronic Navigational Charts encode safety-critical maritime data — bathymetry, navigation aids, regulatory zones — unavailable in consumer mapping services. Drag the slider to compare.

Google Maps Standard View
Google Maps
NOAA Electronic Navigational Chart
NOAA ENC

ENC Lighting Modes

ENC-Bench covers three official lighting modes used in real maritime navigation systems.

ENC Day Mode

Day Mode

ENC Dusk Mode

Dusk Mode

ENC Night Mode

Night Mode

Each sample in ENC-Bench is rendered in all three official IEC 61174 lighting modes (Day, Dusk, Night), totalling 6,830 samples per mode.

Dataset Comparison

ENC-Bench is the first benchmark specifically designed for ENC understanding, significantly surpassing prior maritime and chart datasets in scale and task diversity.

Comparison of ENC-Bench with prior maritime and chart understanding datasets
Figure 2. Comparison of ENC-Bench with prior datasets. ENC-Bench uniquely combines large scale, hierarchical task design, and expert-validated annotations rooted in IHO S-57 standards.

Dataset Statistics

20,490 samples spanning 10 task categories, 3 lighting modes, and 6 scale levels, covering 384 IHO S-57 standardized symbol types.

Standardized IHO S-57 Symbology Types covered in ENC-Bench
Figure 3. 384 IHO S-57 standardized symbol types covered by ENC-Bench, spanning buoys, depth contours, hazard markers, and navigational aids.
Key statistics of ENC-Bench — task distribution and sample counts
Figure 4. Sample distribution across 10 task categories, 3 lighting modes (Day / Dusk / Night), and 6 scale levels (1:50k – 1:300k).

Benchmark Results

Zero-shot performance of 10 state-of-the-art MLLMs across all ENC-Bench task categories. The best model (Gemini-2.5-Pro) achieves only 47.88% overall accuracy, revealing significant room for improvement.

Benchmark results — accuracy of 10 MLLMs across all ENC-Bench task categories
Figure 5. Performance of 10 state-of-the-art MLLMs on ENC-Bench tasks. All models struggle significantly on Spatial Reasoning and Maritime Decision-Making, revealing critical gaps in specialized symbolic understanding.
Error distribution analysis of Gemini-2.5-Pro on ENC-Bench
Figure 6. Error distribution analysis of the best-performing model (Gemini-2.5-Pro), identifying symbolic grounding failures and multi-constraint reasoning deficiencies as the dominant error modes.

Citation

If you find ENC-Bench useful in your research, please cite our paper:

BibTeX
@inproceedings{cheng2026encbench,
  title     = {ENC-Bench: A Benchmark for Evaluating Multimodal Large
               Language Models in Electronic Navigational Chart Understanding},
  author    = {Cheng, Ao and Li, Xingming and Ji, Xuanyu and He, Xixiang
               and Sun, Qiyao and Qiu, Chunping and Huang, Runke and Hu, Qingyong},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision
               and Pattern Recognition (CVPR)},
  year      = {2026}
}

Authors

Ao Cheng

Ao Cheng

NUDT

Xingming Li

Xingming Li

NUDT

Xuanyu Ji

Xuanyu Ji

NUDT

Xixiang He

Xixiang He

NUDT

Qiyao Sun

Qiyao Sun

NUDT

CQ

Chunping Qiu

Intelligent Game & Decision Lab

Runke Huang

Runke Huang

CUHK

Qingyong Hu

Qingyong Hu

Intelligent Game & Decision Lab

huqingyong15@outlook.com

Corresponding author

Contact

For questions about the paper, dataset, or benchmark, please reach out to the corresponding author.

Qingyong Hu

Qingyong Hu

Corresponding Author  ·  Intelligent Game and Decision Lab

huqingyong15@outlook.com

We welcome questions about the ENC-Bench dataset, the benchmark evaluation framework, and potential collaboration opportunities in safety-critical AI and maritime intelligence. For early dataset access, please reach out directly.

Visitors Around the World