Automated recognition of mouse behaviours is crucial in studying psychiatric and neurologic diseases. To achieve this objective, it is very important to analyse temporal dynamics of mouse behaviours. In particular, the change between mouse neighbouring actions is swift in a short period. In this paper, we develop and implement a novel Hidden Markov Model (HMM) algorithm to describe the temporal characteristics of mouse behaviours. In particular, we here propose a hybrid deep learning architecture, where the first unsupervised layer relies on an advanced spatial-temporal segment Fisher Vector (SFV) encoding both visual and contextual features. Subsequent supervised layers based on our segment aggregate network (SAN) are trained to estimate the state dependent observation probabilities of the HMM. The proposed architecture shows the ability to discriminate between visually similar behaviours and results in high recognition rates with the strength of processing imbalanced mouse behaviour datasets. Finally, we evaluate our approach using JHuang's and our own datasets, and the results show that our method outperforms other state-of-the-art approaches.