Skip to main navigation Skip to search Skip to main content

Addressing investor concerns: a Chinese financial question-answering benchmark with LLM-based evaluation

  • Yujian Gan
  • , Yiyi Tao
  • , Jiawang Mo
  • , Xianzheng Huang
  • , Yiwen Li
  • , Kexin Wang
  • , Yi Cai
  • , Lu Liang
  • , Shuzhen Xiong
  • , Qi Ke*
  • , Hua Zheng
  • , Xiaochun Hu
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Downloads (Pure)

Abstract

In recent years, large language models (LLMs) have shown impressive performance across various natural language processing tasks and are increasingly adopted in high-stakes fields such as financial analysis. However, their effectiveness in Chinese financial contexts is hindered by the scarcity of high-quality, domain-specific datasets. To bridge this gap, we present the Chinese Financial Question Answering (CFQA) dataset, a novel resource designed to advance research in financial analysis. CFQA is constructed from publicly available annual reports of multiple Chinese listed companies, paired with corresponding questions and human-annotated answers. Evaluation results reveal that existing QA methods perform poorly on this dataset. CFQA introduces several unique challenges: (1) source documents are in PDF format with complex tabular structures, making information extraction difficult; (2) the length and intricacy of financial reports complicate answer retrieval; and (3) the questions are tightly focused on domain-specific financial content.
Original languageEnglish
Article number6
Number of pages22
JournalEPJ Data Science
Volume15
Issue number1
Early online date18 Dec 2025
DOIs
Publication statusPublished - 22 Jan 2026

Keywords

  • Retrieval-augmented generation
  • Financial benchmark datasets
  • Large language models
  • Financial natural language processing

Fingerprint

Dive into the research topics of 'Addressing investor concerns: a Chinese financial question-answering benchmark with LLM-based evaluation'. Together they form a unique fingerprint.

Cite this