Knowledge Extraction from large-scale data set

ali@fuzzywireless.com
Mar 4, 2022
2 min read

Kazemi and Zarrabi (2017) presented a machine learning approach using deep learning network technique for the fraud detection in the credit card transactions. Sample dataset used was German credit card data, which has been used by several researchers for academic research. The data set composed of set of attributes for people as good or bad credit risks. Several machine learning algorithms were employed to compare the performance of algorithms using accuracy and variance metrics. Algorithms used were k-Nearest neighbors, decision tree GP method, self-organizing map (mapping data in topological space) while Kazemi and Zarrabi (2017) implemented underfit deep autoencoder (UDAE) and overfit deep autoencoder (ODAE). The accuracy of ODAE algorithm was best while variance was also smallest amongst all algorithms.

Autoencoding technique used by Kazemi and Zarrabi (2017) was similar to simple multi-layer perceptron learning method except that it is an unsupervised learning technique. Neural network was used to map initial data into low dimensional space without impacting initial structure, while the goal is to return to input again. Thus, process of converting raw data into low dimension is termed as encoding while the reverse structure to reconstruct original data is termed as decoding (Kazemi & Zarrabi, 2017). Main idea is to select the best feature for data analysis. In the particular example of classification, if the input data does not have too many dimensions or discriminative than sparse auto encoding technique can be used, which utilize more number of neurons in the hidden layer than input data for trivial solution.

The performance of machine learning technique was improved over traditional algorithms, k-NN, SOP, and decision tree by using autoencoding (Kazemi & Zarrabi, 2017). First approach was based on autoencoder with 20 neurons in the input layer and 15, 10 and 5 neurons in the subsequent layers respectively thus referred as underfit deep autoencoder (UDAE). Second method was implemented by Kazemi and Zarrabi (2017) with 20 neurons in the input layer and 30, 50 and 100 neurons in the subsequent layers respectively while exerting 20%, 10% and 5% sparsity limitations, thus referred as overfit deep autoencoder (ODAE).

Reference:

Kazemi, Z. & Zarrabi, H. (2017). Using deep networks for fraud detection in the credit card transactions. 2017 IEEE 4th International Conference on knowledge based engineering and innovation, 630-633

Knowledge Extraction from large-scale data set

Recent Posts

Comments