Key results on hallucination detection benchmarks
Adaptive Bayesian Estimation with Guided Semantic Exploration
Models the semantic distribution via a Dirichlet prior over semantic categories. Marginalizes over the unknown number of meanings K to compute E[h|D] and Var[h|D] with tighter posterior bounds through generation probability constraints.
Dynamically terminates sampling once posterior variance falls below threshold γ. Simple queries converge quickly; complex queries receive additional exploration— achieving ≈50% sample reduction in low-budget settings.
Identifies semantically critical tokens via importance weights and systematically perturbs them to discover diverse interpretations. Importance sampling maintains unbiased estimates while accelerating variance convergence.
AUROC as a function of sampling budget N — Llama-3.1-8B
Our method (solid blue) consistently outperforms baselines at every sampling budget N=1–10. At N=2, we achieve an average 12.6% higher AUROC than the strongest baseline.
AUROC on four QA datasets across three LLMs (higher is better)
| Dataset | P(True) | SAR | SE | SE-SDLG | Ours ↑ |
|---|
Bold values indicate state-of-the-art. Our method achieves the highest AUROC in 23 out of 24 experimental settings.
For questions about the paper, method, or potential collaborations, please contact the corresponding author.
Qingyong Hu
Corresponding Author · Intelligent Game and Decision Lab, Beijing
huqingyong15@outlook.comWe welcome questions about the adaptive Bayesian sampling framework, the guided semantic exploration strategy, and potential collaboration opportunities in LLM hallucination detection and uncertainty quantification. For code/implementation questions, you may also reach out to the first authors at sunqiyao18@nudt.edu.cn or lixingming@nudt.edu.cn.
Visitors Around the World