Continuing advances in data storage and communication technologies have led to an explosive growth in digital music collections. To cope with their increasing scale, we need effective Music Information Retrieval (MIR) capabilities like tagging, concept search and clustering. Integral to MIR is a framework for modelling music documents and generating discriminative signatures for them. In this paper, we introduce a multimodal, layered learning framework called DMCM. Distinguished from the existing approaches that encode music as an ensemble of order-less feature vectors, our framework extracts from each music document a variety of acoustic features, and translates them into low-level encodings over the temporal dimension. From them, DMCM elucidates the concept dynamics in the music document, representing them with a novel music signature scheme called Stochastic Music Concept Histogram (SMCH) that captures the probability distribution over all the concepts. Experiment results with two large music collections confirm the advantages of the proposed framework over existing methods on various MIR tasks.