The past decade had witnessed an unprecedented growth in the amount of available digital content, and its volume is expected to continue to grow the next few years. Unstructured text data generated from web and enterprise sources form a large fraction of such content. Many of these contain large volumes of reusable data such as solutions to frequently occurring problems, and general know-how that may be reused in appropriate contexts. In this work, we address issues around leveraging unstructured text data from sources as diverse as the web and the enterprise within the Case-based Reasoning framework. Case-based Reasoning (CBR) provides a framework and methodology for systematic reuse of historical knowledge that is available in the form of problem solution pairs, in solving new problems. Here, we consider possibilities of enhancing Textual CBR systems under three main themes: procurement, maintenance and retrieval. We adapt and build upon the state of-the-art techniques from data mining and natural language processing in addressing various challenges therein. Under procurement, we investigate the problem of extracting cases (i.e., problem-solution pairs) from data sources such as incident/experience reports. We develop case-base maintenance methods specifically tuned to text targeted towards retaining solutions such that the utility of the filtered case base in solving new problems is maximized. Further, we address the problem of query suggestions for textual case-bases and show that exploiting the problem-solution partition can enhance retrieval effectiveness by prioritizing more useful query suggestions. Additionally, we illustrate interpretable clustering as a tool to drill-down to domain specific text collections (since CBR systems are usually very domain specific) and develop techniques for improved similarity assessment in social media sources such as microblogs. Through extensive empirical evaluations, we illustrate the improvements that we are able to achieve over the state-of-the-art methods for the respective tasks.
|Type||PhD Thesis- Indian Institute of Technology Madras|
|Publication status||Published - Mar 2014|