Sciences & Société
Soutenance de thèse : Johannes JURGOVSKY
Soutenance en cotutelle internationale entre l’Université de Passau (Passau, Allemagne) et l’INSA Lyon
Doctorant : Johannes Jurgovsky
Laboratoire INSA : LIRIS - DRIM
Ecole doctorale : ED512 InfoMaths
In this thesis, we study credit card fraud detection, a specific problem in the electronic payment sector with high practical relevance, and we address several of its intricate chal- lenges by means of methods from the domain of machine learning with the goal to identify fraudulent transactions illegitimately issued on behalf of the rightful card owner. In partic- ular, we explore several means to leverage contextual information beyond a transaction’s basic attributes on the transaction level, sequence level and user level.
On the transaction level, we aim to identify fraudulent transactions which, in terms of their attribute values, are globally distinguishable from genuine transactions. We provide an empirical study of the influence of class imbalance and forecasting horizons on the clas- sification performance of a random forest classifier. We also leverage external knowledge sources to augment transactions with additional features to support a better discrimination of global frauds from the genuine background. Our results show that external information about countries and calendar events can improve the classification performance most no- ticeably on card-not-present transactions.
On the sequence level, we aim to detect the class of frauds that are inconspicuous in the background of all transactions but peculiar with respect to the short-term sequence they appear in. We use a Long Short-term Memory network (LSTM) for modeling the sequential succession of transactions and contrast it with state-of-the-art feature engineering methods that summarize transaction sequences in form of feature aggregates. Our results show that LSTM-based modeling is a promising strategy for characterizing sequences of card-present transactions but it is not adequate for card-not-present transactions.
On the user level, we elaborate on feature aggregates by proposing a flexible concept that allows to describe a broad spectrum of such features. We provide a CUDA-based im- plementation for the computationally expensive extraction with a speed-up of two orders of magnitude. Our feature selection study reveals that aggregates extracted from users’ trans- action sequences are more useful than those extracted from merchant sequences. Moreover, we discover multiple sets of candidate features with equivalent performance as manually engineered aggregates while being vastly di erent in terms of their structure.
Regarding future work, we motivate the usage of simple and transparent machine learn- ing methods for credit card fraud detection and we sketch a simple user-focused modeling approach.