Dating texts by multi-class classification with sliding time intervals

Gregory Toner, Xiwu Han

Research output: Contribution to conferencePaper

Abstract

We propose a practical method to date texts by classification with sliding time intervals (STI). This further explores the advantage of multi-class text classification, while drawing upon temporal characteristics in the training corpus. Extensive experiments were made on English and medieval Irish texts. Results showed that our STI dating method significantly outperformed classifiers with fixed time intervals (FTI). The Naïve Bayes Multinomial (NBM) with STI achieved the state-of-the-art dating precision on DTE Subtask 2 though only involving features of n-gram characters and words. Experiments on dating long documents and further analysis also indicated some promising points for further text dating research and other humanities fields.
Original languageEnglish
Pages1
Number of pages6
DOIs
Publication statusPublished - 27 Feb 2018
EventInternational Congress on Image and Signal Processing, BioMedical Engineering and Informatics - Shanghai, China
Duration: 14 Oct 201716 Oct 2017

Conference

ConferenceInternational Congress on Image and Signal Processing, BioMedical Engineering and Informatics
Abbreviated titleCISP-BMEI 2017
CountryChina
CityShanghai
Period14/10/201716/10/2017

    Fingerprint

Keywords

  • Bayes methods
  • Naïve Bayes Multinomial
  • sliding time intervals
  • medieval Irish
  • text dating
  • machine learning
  • annals

Cite this

Toner, G., & Han, X. (2018). Dating texts by multi-class classification with sliding time intervals. 1. Paper presented at International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, Shanghai, China. https://doi.org/10.1109/CISP-BMEI.2017.8302102