Bayesian Analysis
- ali@fuzzywireless.com
- Mar 3, 2022
- 3 min read
There are two widely used types of statistics, namely Frequentist statistics and Bayesian statistics (Analytics Vidhya, 2016). Frequentist statistics computes whether the event happens or not, by calculating the probability of event in an experiment assuming that if experiment is repeated under the same conditions the outcome will remain same. For example, a fair coin is tossed 10 times, giving 4 heads while in another experiment when a fair coin is tossed 10000 times, 5067 heads were observed. Although the difference in expected outcome has increased from 1 (4 heads instead of 5) to 67 (5067 heads instead of 5000), the proportion of heads has approached close to 0.5 or 50% with increased number of iterations. Thus, highlights that outcome of an experiment is dependent upon the number of times experiment is performed, which means p-value will differ with different sample sizes, confidence interval varies with sample size and do not provide most probable values (2016).
On the other hand, Bayesian statistics is based on probabilities applied to statistical problem, thus provide a tool to update the belief in the event of new data (Analytics Vidhya, 2016). Bayes theorem is based on conditional probability. For example, the fairness of coin is termed as parameter ‘u’, while outcome as ‘D’ than the probability of getting 4 heads out of 10 tosses for the given fairness of coin will be P (D | u). Other way is to say, for the outcome ‘D’ the probability of coin being fair is ‘u’, that is P (u | D).
Bayes theorem is represented as:
P (u | D) = ( P(D | u) x P (u) ) / P (D)
where,
P (u) is termed is as prior, which means strength of belief in the fairness of coin before toss between 0 and 1
P (D | u) is the probability of outcome ‘D’ for the given coin fairness of ‘u’.
P (D) is the evidence, which is the probability by summing (or integrating) across all possible values of coin fairness ‘u’, weighted by strength of belief on values of ‘u’
P (u | D) is the posterior belief based on evidence, which is number of heads (2016).
The underlying assumption in Bayesian analysis is that all parameters are random quantities (Stata, 2019). Thus, the Bayesian analysis is summarized by entire distribution of values instead one fixed values as in frequentist analysis (2019). In order to define Bayesian model, likelihood function P (D | u) and prior distribution of P (u) are required while the product of these two will give posterior belief P (u | D) (Analytics Vidhya, 2016). Bernoulli likelihood function can be used for P (D | u), while prior distribution can be modelled using binomial distribution. The mode of the posterior distribution is the desired estimate (Weisstein, 2019)). The null hypothesis in Bayesian framework assumes infinite probability distribution at a particular prior value of ‘u’, but zero elsewhere (Analytics Vidhya, 2016). Alternate hypothesis will be that all prior values (‘u’) are possible. Bayesian factor is based on the magnitude of shift in distribution with the change in values of ‘u’. To reject null hypothesis, Bayesian factor of less than 1/10 is preferred, this is different from p-value as Bayesian factor is independent of sample size and number of coin tossing iterations. High Density interval (HDI) is computed based on posterior data, that is new data thus HDI of 95% means that there is 95% probability of credible values which differs from confidence interval (2016).
Theoretically speaking, when sample size is large, and parameters are normally distributed, the results between maximum likelihood and Bayesian estimation are pretty close (Schoot, Kaplan, Denissen, Asendorpf, Neyer and Aken, 2013). However, the interpretation differs as Bayesian incorporate background knowledge while frequentist statistics does not use background knowledge. Similarly, Bayesian update knowledge by incorporating new data, while frequentist statistics continue to test null hypothesis again and again. Also, Bayesian focus on predictive accuracy instead of ‘up or ‘down’ significance testing. From experimental or practical perspective, Bayesian eliminate the issues associated with small sample size. Bayesian handles non-normally distributed parameters well versus frequentist methods. Unlikely or outrageous results are less likely to happen using Bayesian approach as it is more conservative versus frequentist methods. However, one key limitation is the usage of prior knowledge in Bayesian approach, which needs to be transparently declared to highlight the background of such prior values influencing the outcome. Computation time is higher on Bayesian approach due to iterative sampling techniques (Schoot et al., 2013).
References:
Schoot, R., Kaplan, D., Denissen, J., Asendorpf, J., Neyer, F. & Aken, M. (2013). A gentle introduction to Bayesian analysis: applications to developmental research. Retrieved from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4158865/
Stata (2019). What is Bayesian analysis? Retrieved from https://www.stata.com/features/overview/bayesian-intro/
Weisstein, E (2019). Bayesian Analysis from Mathworld – A Wolfram Web Resource. Retrieved from http://mathworld.wolfram.com/BayesianAnalysis.html
Analytics Vidhya (2016). Bayesian statistics explained to beginners in simple English. Retrieved from https://www.analyticsvidhya.com/blog/2016/06/bayesian-statistics-beginners-simple-english/

Comments