Sequential Anomaly Detection in Highly Imbalanced Data
Author | : Ayman Alazizi |
Publisher | : |
Total Pages | : 0 |
Release | : 2020 |
ISBN-10 | : OCLC:1259439281 |
ISBN-13 | : |
Rating | : 4/5 ( Downloads) |
Download or read book Sequential Anomaly Detection in Highly Imbalanced Data written by Ayman Alazizi and published by . This book was released on 2020 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Technological development has greatly contributed to the growth of e-commerce and boosted the confidence of clients in using their credit cards. However, the problem of credit card fraud has also expanded, resulting in billions of dollars in financial losses. Thus, designing fraud detection systems that reduce these losses is very important. As a result, many researchers are working to create fraud detection systems based on advanced machine learning techniques to help fraud investigators detect fraud patterns early. Building machine learning algorithms to identify fraudulent transactions is a challenging task. Therefore, in this thesis, we highlight some complex challenges that appear in real world datasets, such as: the extremely unbalanced data, i.e. fraudulent transactions represent a small part of all transactions, the concept drift resulting from changes in fraudsters' behaviours and buying strategies over time and the overlap between genuine and fraudulent transactions. We also focus on the human errors issue, which is one of the main reasons for noisy labels. In addition to the previous challenges, we also show the importance of handcrafted features that could resume sequential information. However, these features are time and money consuming. To overcome these challenges, we also proposed a new approach to leverage the sequential information and manage the problem of imbalanced data in order to extract features automatically instead of handcrafted features. Empirical results on real data sets of credit card transactions show that our approach is efficient, accurate and improves the performance of the classification model.