In many applications in applied statistics researchers reduce the complexity of a data set by combining a group of variables into a single measure using factor analysis or an index number. We argue that such compression loses information if the data actually has high dimensionality. We advocate the use of a non-parametric estimator, commonly used in physics (the Takens estimator), to estimate the correlation dimension of the data prior to compression. The advantage of this approach over traditional linear data compression approaches is that the data does not have to be linearized. Applying our ideas to the United Nations Human Development Index we find that the four variables that are used in its construction have dimension three and the index loses information.
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty