Supervised and Unsupervised Learning
- ali@fuzzywireless.com
- Mar 3, 2022
- 2 min read
Supervised machine learning is defined as a learning technique to understand the mapping function relating input variable, say X with output variable, say Y; relation can be expressed as Y = f(X). The objective of supervised machine learning is to understand the mapping function by using training set using an iterative process, when new input data (X) is applied, output (Y) can be predicted with greater accuracy (Brownlee, 2016). Supervised learning can be further classified into regression and classification. Typical problems handled by supervised machine learning techniques are linear regression for regression problems, random forest for classification and regression problems and support vector machine algorithms for classification problems.
Common problem with supervised learning is “overfitting”, which means that data is learned instead of underlying function (Donalek, 2011). Overfitting will result in good correlation with training data but poor with new input data. The problem can be avoided by using proper subsets, which are training, validation and test sets. Popular supervised learning model, like neural networks, multi-layer perceptron and decision tree excel in different applications.
Unsupervised learning is a technique where only input variable X is available and algorithm tries to learn the distribution of X and underlying structure of data (Brownlee, 2016). The in-depth learning of input (X) will result in identifying interesting trends, classifications and discoveries. Typical use cases of unsupervised learning are clustering and association problems, like k-means for clustering, Apriori algorithms etc.
Unsupervised learning is generally used on large datasets, which is why it became harder as compared to supervised learning (Mishra, 2017). Since the output is not known therefore it’s difficult to know whether results are meaningful or not thus requiring internal and external evaluation. ‘Overfitting’ can also happen to unsupervised learning. Principal component analysis (PCA) can be used to overcome the problem of over-fitting in unsupervised learning. PCA generates data representation using fewer dimensions, thus finding a sequence of linear combinations of variables with maximum variance and mutually uncorrelated (Stanford, n.d.). PCA can also be used as a data visualization tool.
References:
Brownlee, J. (2016). Supervised and unsupervised machine learning algorithms. Retrieved from https://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-algorithms/
Donalek, C. (2011). Supervised and unsupervised learning. Retrieved from http://www.astro.caltech.edu/~george/aybi199/Donalek_Classif.pdf
Stanford (n.d.). Unsupervised Learning. Retrieved from https://lagunita.stanford.edu/c4x/HumanitiesScience/StatLearning/asset/unsupervised.pdf
Mishra, S. (2017). Unsupervised learning and data clustering. Retrieved from https://towardsdatascience.com/unsupervised-learning-and-data-clustering-eeecb78b422a
Comments