The cult of statistical significance. Hidden random uncertainties in science, their roots in human mind, and unexpected outcomes 1400-236KIS
This lecture explores the common practice of reducing statistical inference to tests of statistical significance, its possible causes, and its negative consequences for science. Referring to selected simple statistical methods, the lecture demonstrates the limited information provided by statistical testing and its overestimation. Simultaneously, various common manifestations of misconceptions about statistical significance or its absence are analyzed. The question of their causes leads to considerations of the functioning of the human mind, including in an evolutionary context. The question of their consequences leads to consideration of the undoubtedly significant role of the statistical testing paradigm in the increasingly perceived crisis of reproducibility in science. Selected statistical methods are considered not so much in the context of their mathematical foundations, but primarily in terms of the nature of the answers they provide about the studied fragments of reality. Parametric and nonparametric methods for one and two samples, analysis of variance in single and double classification, frequency analysis methods, correlation analysis, linear regression, and logistic regression, and examples of multivariate methods are included. Particular attention is devoted to the analysis of interactions, as one of the most neglected topics in textbooks. The discussion of these methods leads to consideration of more general issues and questions that occupy a significant portion of the lecture, such as:
(1) The most important arguments in the decades-long debate on the usefulness of significance testing – from acceptance, through recognition of its necessity, to the demand for its elimination from scientific practice. The power of statistical tests. Widespread but rarely respected recommendations regarding sufficient statistical power. The paradox of excessive power, resulting from misunderstandings of tests. The power of tests and the precision of estimates.
(2) The contemporary paradigm of statistical significance testing as a surprising product of cultural evolution – a hybrid of various concepts whose authors would likely disagree with the current result.
(3) What is a negative result? Inference from ignorance, the ubiquity of Type II errors. When is a negative result a result, and when is it a lack of a result? Knowing that there are no significant effects can be important knowledge; sometimes it is precisely what is needed. Important lessons from (bio)equivalence testing methods. The acceptance of null hypotheses, contrary to general principles, as a common element of statistical procedures, has a serious negative impact on the validity of inferences. The extreme consequences of John Ioannidis's views: when is advanced research equipment reduced to the role of the most expensive random number generator?
(4) The "hunt" for statistical significance, the so-called "torturing of data," and sometimes even the "torturing of reality." Interpretations of "almost significant" results: what might underlie them? The traditional distinction between statistical significance and substantive significance and its incompleteness from the perspective of controlling the risk of bias.
(5) Statistical significance perceived—absurdly—as a dimension of reality. Elements of analysis of the language used in describing the statistical analysis of research results. Confusion between statements about samples and statements about the population. Figures of speech that avoid not only quantitative questions but also meeting the universally required criterion of statistical significance. How seriously do researchers take statistical inference? Researchers' expectations of statisticians, the actual place of statistical inference in research processes. Statistical significance, empathy, and the ethics of data analysis.
(6) Diminishing effects in subsequent studies of the same phenomenon. The so-called significance filter and overestimation of effects, the "winner's curse." Corrections for multiple testing, differences in statisticians' views on the appropriateness of their use, the frequency of false discoveries, and selective presentation of results.
(7) How strong do we want certainty, and what should it refer to? The risk of Type I and Type III errors (directional, Type S) and confidence levels. Jerzy Neyman's mistake, considered the father of confidence intervals, in justifying the need for them – do researchers really want interval estimates? The earlier history of confidence intervals, the recurring, ineffective calls for the use of interval estimation, the so-called "statistical reform" and the "new statistics." The earlier history of confidence intervals, the recurring, ineffective calls for the use of interval estimation, the so-called "statistical reform" and the "new statistics".
(8) Two important obstacles to interval thinking: uncertainty aversion and excessive optimism. Psychological aspects and attempts at evolutionary explanations. The law of large numbers and belief in the "law of small numbers" (sensu Tversky and Kahnemann). Behavior in the face of uncertainty, L. Savage's Sure Thing Principle and its frequent violations. Depressive realism and the evolution of excessive optimism. Optimism as an individual's strategy or as an individual's implementation of an evolutionary strategy into which it happens to be embedded. From what perspective is the "obsession with averages" a mistake, and from what perspective is it not?
(9) The reproducibility and repeatability crisis in science (irreproducibility crisis), its publicity over the past two decades, and initiatives undertaken to understand the problem. Reproducibility and repeatability in metrology versus the less precise understanding of these terms in pure science. When is the result of a previous experiment considered to be replicable? The role of the statistical testing paradigm in the repeatability crisis. The cult of statistical significance as a phenomenon that distracts from important questions. Selected attempts to improve the situation and the related controversies. Some computational aspects of the discussed approaches to statistical inference are the subject of a separate course, "Revealing Statistical Uncertainties Hidden in Research Results."
Warning: Recognizing the scale of statistical uncertainty can discourage the mechanical and thoughtless application of many common procedures.
Main fields of studies for MISMaP
mathematics
psychology
biotechnology
environmental protection
geology
applied geology
astronomy
spatial development
physics
geography
chemistry
computer science
Type of course
elective monographs
general courses
optional courses
Mode
Prerequisites (description)
Course coordinators
Bibliography
Additions will be made during the course.
American Statistician Special Supplement. 2019. Statistical Inference in the 21st Century: A World Beyond p<0.05. Am.Stat. 73: sup1.: 1-401.
Amrhein V, Greenland S, McShane B. 2019. Retire statistical significance. Nature, 567: 305-307.
Appiah KA. 2017. As If: Idealization and Ideals. Harvard University Press.
Berry D.A. 1996. Statistics. A Bayesian Perspective. Duxbury Press.
Boyd B. 2009. On the Origin of Stories. Evolution, Cognition, and Fiction. Harvard University Press.
Burton R.A. 2008. On Being Certain. Believing You Are Right Even When You're Not. St. Martin's Griffin.
Clarke BS, Clarke JL. 2018. Predictive statistics. Cambridge University Press.
Cumming G. 2011. Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis. Routledge.
Dawkins R. 1982. The Extended Phenotype. The Gene as the Unit of Selection. Freeman. [Fenotyp rozszerzony. Dalekosiężny gen. 2003, Prószyński i S-ka]
Distin K. 2005. The Selfish Meme. A Critical Reassessment. Cambridge University Press.
Gelman A, Hill J, Vehtari A. 2021. Regression and Other Stories. Cambridge University Press.
Gigerenzer G. 2008. Rationality for Mortals. How People Cope with Uncertainty. Oxford University Press.
Halsey LG, Curran-Everett D., Vowler SL, Drummond G. 2015. The fickle P value generates irreproducible results. Nature Methods, 12: 179-185.
Harlow L, Mulaik S, Steiger J, editors. 1997. What If There Were No Significance Tests? Lawrence Erlbaum Associates.
Hogarth R.M. 2001. Educating Intuition. University od Chicago Press.
Hubbard R. 2015. Corrupt research. SAGE.
Ioannidis J.P.A. 2005. Why most published research findings are false. PLoS Med 2(8): e124.
Jackman S. 2009. Bayesian Analysis for the Social Sciences. Wiley.
Kahnemann D. 2011. Thinking, Fast and Slow. Farrar, Straus and Giroux. [Pułapki myślenia. O myśleniu szybkim i wolnym. 2012, Media Rodzina]
Kahneman D, Sibony O, Sunstein CR. 2021. Noise. Little, Brown and Company. [Szum. Media Rodzina, 2022].
Kline R.B. 2004. Beyond SignificanceTesting. Reforming Data Analysis Methods in Behavioral Research. American Psychological Association.
Kurzban R. 2012. Why Everybody (Else) Is a Hypocrite: Evolution and the Modular Mind. Princeton University Press.
Lazzeroni LC, Lu Y, Belitskaya-Levy I. 2014. P-values in genomics: Apparent precision masks high uncertainty. Molecular Psychiatry, 19: 1336–1340.
Lecoutre B, Poitvineau J. 2014. The Significance Test Controversy Revisited. The Fiducial Bayesian Alternative. Springer.
Meeker WQ, Hahn GJ, Escobar LA. 2017. Statistical intervals: A guide for practitioners and researchers. Wiley.
Motulsky H. 2014. Intuitive Biostatistics, 3rd edition. Oxford University Press.
Nature Publishing Group. 2013. Announcement: Reducing our irreproducibility. Nature 496: 398.
Nuzzo R. 2014. Scientific method: statistical errors. Nature 506: 150-152.
Panter A, Sterba S, editors. 2011. Handbook of Ethics in Quantitative Methodology. Routledge.
Salsburg D. 2001. The Lady Tasting Tea. How Statistics Revolutionized Science In the Twentieth Century. Holt.
Savage S. 2012. The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty. Wiley.
Schweder T, Hjort NL. 2016. Confidence, Likelihood, Probability: Statistical Inference with Confidence Distributions. Cambridge University Press.
Tannert C., Elvers HD, Jandrig B. 2011. The ethics of uncertainty. EMBO Reports, 8:892-896.
Vaihinger H. 1925/2015. The Philosophy of As If. Random Shack.
Wang C. 1992. Sense and Nonsense of Statistical Inference: Controversy, Misuse, and Subtlety. CRC Press.
Wasserstein RL, Schirm AL. Lazar NA. 2019. Moving to a World Beyond "p<0.05". Am. Stat. 73:sup1: 1-19.
Ziliak S.T., McCloskey D.N. 2008. The Cult of Statistical Significance. How the Standard Error Costs us Jobs, Justice, and Lives. Michigan University Press
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: