11 sep
11/09/2017 13:30


Soutenance de thèse : Guillaume BOSC

Anytime Discovery of a Diverse Set of Patterns with Monte Carlo

Doctorant : Guillaume BOSC 

Laboratoire INSA : LIRIS
Ecole doctorale : EDA512 : InfoMaths

The unsupervised discovery of patterns that strongly distinguish one class label from another is still a challenging data-mining task. Subgroup Discovery (SD) is a formal pattern mining framework that enables the construction of intelligible classifiers, and, most importantly, to elicit interesting hypotheses from the data. However, SD still faces two major issues: (i) how to define appropriate quality measures to characterize the interestingness of a pattern; (ii) how to select an accurate heuristic search technique when exhaustive enumeration of the pattern space is unfeasible. The first issue has been tackled by Exceptional Model Mining (EMM) for discovering patterns that cover tuples that locally induce a model substantially different from the model of the whole dataset. The second issue has been studied in SD and EMM mainly with the use of beam-search strategies and genetic algorithms for discovering a pattern set that is non-redundant, diverse and of high quality. In this thesis, we argue that the greedy nature of most such previous approaches produces pattern sets that lack diversity. Consequently, we formally define pattern mining as a game and solve it with Monte Carlo Tree Search (MCTS), a recent technique mainly used for games and planning problems in artificial intelligence. Contrary to traditional sampling methods, MCTS leads to an any-time pattern mining approach without assumptions on either the quality measure or the data. It tends towards an exhaustive search if given enough time and memory. The exploration/exploitation trade-off allows the diversity of the result set to be improved considerably compared to existing heuristics. We show that MCTS quickly finds a diverse pattern set of high quality through several applications in neurosciences and game analytics. We also propose and validate a new quality measure especially tuned for imbalanced multi-label data.

Informations complémentaires

  • Bâtiment Blaise Pascal, Salle 337 - 9 avenue Jean Capelle, Villeurbanne.