Abstract
The development of world wide web with easy access to massive
information sources anywhere and anytime paves way for more people to
rely on online news media rather than print media. The scenario expedites
rapid growth of online news industries and leads to substantial competitive
pressure. In this work, we propose a set of hybrid features for online news
popularity prediction before publication. Two categories of features
extracted from news articles, the first being conventional features
comprising metadata, temporal, contextual, and embedding vector features,
and the second being enhanced features comprising readability, emotion, and
psycholinguistics features are extracted from the articles. Apart from
analyzing the effectiveness of conventional and enhanced features, we
combine these features to come up with a set of hybrid features. We curate
an Indian news dataset consisting of news articles from the most rated Indian
news websites for the study and also contribute the dataset for future
research. Evaluations are performed over the Indian news dataset (IND) and
compared with the performance over the benchmark mashable dataset using
various supervised machine learning models. Our results indicate that the
proposed hybrid of enhanced features with conventional features are highly
effective for online news popularity prediction before publication.
information sources anywhere and anytime paves way for more people to
rely on online news media rather than print media. The scenario expedites
rapid growth of online news industries and leads to substantial competitive
pressure. In this work, we propose a set of hybrid features for online news
popularity prediction before publication. Two categories of features
extracted from news articles, the first being conventional features
comprising metadata, temporal, contextual, and embedding vector features,
and the second being enhanced features comprising readability, emotion, and
psycholinguistics features are extracted from the articles. Apart from
analyzing the effectiveness of conventional and enhanced features, we
combine these features to come up with a set of hybrid features. We curate
an Indian news dataset consisting of news articles from the most rated Indian
news websites for the study and also contribute the dataset for future
research. Evaluations are performed over the Indian news dataset (IND) and
compared with the performance over the benchmark mashable dataset using
various supervised machine learning models. Our results indicate that the
proposed hybrid of enhanced features with conventional features are highly
effective for online news popularity prediction before publication.
Original language | English |
---|---|
Pages (from-to) | 539-545 |
Journal | IAES International Journal of Artificial Intelligence (IJ-AI) |
Volume | 11 |
Issue number | 2 |
DOIs | |
Publication status | Published - 01 Jun 2022 |
Externally published | Yes |