Logistic Regression Models
- ali@fuzzywireless.com
- Mar 3, 2022
- 2 min read
Chandrayan (2017) define supervised machine learning algorithms as the model where the objective is to find the values of dependent variable with unknown independent variables after training the model with known inputs and outputs. Supervised learning algorithms are mainly classified into, classification, regression and ranking. Within classification, especially binary classification like yes/no, 1 or 0 etc., there are several algorithms namely, logistic regression, support vector machines, decision trees, and neural networks (2017). Litter (2019) define binary logistic regression models as the ones which compute the probability of dichotomous outcome using multiple independent variables.
Binary logistic regression is computed using maximum likelihood estimation (MLE), while linear regression utilize ordinary least squares approach (Chandrayan, 2017). MLE in logistic regression algorithms is an iterative process where model starts with a guess based value of weight for independent variable, and then continues to adjust until there is np further change in predicted value of dependent variable (2019).
The logistic equation can be written as:
F(x) = L / (1 + e^-k(x – x0))
where,
L = maximum value of the curve
K = steepness of curve
X0 = Sigmoid’s midpoint value, which is defined as S(x) = 1 / 1+e^-x after replacing x0=0, L=1 and k=1
The “S” shaped Sigmoid function has finite limits of 0 when x approaches negative infinity and 1 when x approaches positive infinity (Chandrayan, 2017). The value of Sigmoid function when x = 0 is 0.5 thus when output is more than 0.5, the outcome can be classified as 1 (or yes) while if output is less than 0.5, the outcome can be classified as 0 (or no) (2017). Litter (2019) termed this equation is “odds ratio” and highlighted that logistic regression can be extended to more than two categorical outcomes thus named as multinomial regression modelling. However, the only difference is that one value of outcome must be used as reference category. For instance, the outcome of survey can be categorized into “poor”, “average”, “good”,” very good”, and “excellent”. In this case, the highest category can be tagged as reference category to determine the odds of being in a “higher” category for each of the independent variable. For mathematical computation of binomial regression, logit is used which is natural log of an odd ratio, often referred as “log odds” (2019).
References:
Litter, S. (2019). Analyzing categorical data using logistic regression models. Retrieved from https://select-statistics.co.uk/blog/analysing-categorical-data-using-logistic-regression-models/
Chandrayan, P. (2017). Machine learning Part: Logistic regression. Retrieved from https://towardsdatascience.com/machine-learning-part-3-logistics-regression-9d890928680f
Comments