New, automated forms of data-analysis are required in order to understand the high-dimensional trajectories that are obtained from molecular dynamics simulations on proteins. Dimensionality reduction algorithms are particularly appealing in this regard as they allow one to construct unbiased, low-dimensional representations of the trajectory using only the information encoded in the trajectory. The downside of this approach is that different sets of coordinates are required for each different chemical systems under study precisely because the coordinates are constructed using information from the trajectory. In this paper we show how one can resolve this problem by using the sketch-map algorithm that we recently proposed to construct a low-dimensional representation of the structures contained in the protein data bank (PDB). We show that the resulting coordinates are as useful for analysing trajectory data as coordinates constructed using landmark configurations taken from the trajectory and that these coordinates can thus be used for understanding protein folding across a range of systems.
Ardevol, A., Palazzesi, F., Tribello, G. A., & Parrinello, M. (2016). General, PDB-based collective variables for protein folding. Journal of chemical theory and computation, 12(1), 29-35. https://doi.org/10.1021/acs.jctc.5b00714