Using Reinforcement Learning in Solving Exam Timetabling Problems

  • Kehan Han

Student thesis: Doctoral ThesisDoctor of Philosophy


There is increasing interest in the area of research involving automated technologies for solving problems. High generality and ease of reusability have become a major goal in the development of Search methodologies within automation. One search method that meets these requirements is the hyper-heuristic. A hyper-heuristic is a unique method as it explores the space of low-level heuristics rather than specific problem domains. Two common types of hyper-heuristic have been proposed in the literature: selection hyper-heuristics and generation hyper-heuristics. In this thesis, the research focus is concentrated on selection hyper-heuristics, using a single point base search in the selection and application of suitable low-level heuristics to the target problem domain during optimization. The selection process includes a learning ability, with the performance of all selected heuristics recorded and updated during the search process.
The learning ability is a major focal point within the study of hyper-heuristic application in this thesis. It is enhanced through introducing Reinforcement Learning algorithms into the design of the proposed single point-based selection hyper-heuristic framework. Various Reinforcement Learning techniques are abstracted and designed as heuristic selection methods and utility update methods. The goal of this part of the research is to create a way to give machine learning more generality and applicability by embedding it into a hyper-heuristic. The learning scheme of the Hyper-heuristic is also studied thoroughly through this multiple component design. Furthermore, a cooperative hyper-heuristic based on previous research results is proposed and implemented. This cooperative framework manages multiple hyper-heuristics with different local settings to perform parallel search to solve optimization problems. Different hyper-heuristic designs have varying degrees of qualitative performance as target problems change. Rather than aiming to determine a single best hyper-heuristic configuration, the construction of a set of hyper-heuristics with varied parameter settings forms the goal of the learning process. Through the cooperative parallel search, the hyper-heuristic search performance can be improved, allowing a consistent and effective level of generality. The core goal of this research is in achieving this level of generality with the hyper-heuristic approach.
Two general approaches are proposed in particular within this work: Reinforcement Learning Simulated Annealing hyper-heuristic framework and Cooperative hyper-heuristic framework. Both frameworks are implemented and tested using examination timetabling problems as a case study. Examination timetabling is a classic optimization and scheduling problem which has attracted the attention of researchers from various fields. The benchmark selected for the purposes of experimentation and qualitative comparison is the ITC2007 datasets, which closely approximate real-world scheduling problems.
As outlined within this work, the two proposed frameworks have been able to consistently solve all instances of the chosen ITC2007 exam timetabling benchmark datasets, despite the varied characteristics between instances. Furthermore, the proposed cooperative hyper-heuristic managed to deliver competitive results in solving the ITC2007 problem, as compared to current hyper-heuristics currently published in the literature. This work shows that the performance of the hyper-heuristics is improved by applying a cooperative hyper-heuristic approach, while maintaining generality over many instances of the problem. In this research, Reinforcement Learning is embedded into the general hyper-heuristic framework, improving the generality and applicability of machine learning techniques when used to solve complex scheduling and optimisation problems.
Date of AwardOct 2018
Original languageEnglish
Awarding Institution
  • Queen's University Belfast
SupervisorPaul McMullan (Supervisor) & Barry McCollum (Supervisor)

Cite this