No Estimation without Inference
A Response to the International Society of Physiotherapy Journal Editors
Keywords:physical therapy, statistical significance, inference, estimation
Recently, the journal Physical Therapy published a joint editorial: Elkins, M. R. et al. Statistical inference through estimation: recommendations from the International Society of Physiotherapy Journal Editors. Phys. Ther. 102, (2022).
This editorial was published on behalf of the International Society of Physiotherapy Journal Editors (ISPJE), recommending that researchers stop using null-hypothesis significance tests and adopt “estimation methods”. Further, the editorial warns that this is not merely an idea to consider, but a coming policy of journals: “the [ISPJE] will be expecting manuscripts to use estimation methods instead of null hypothesis statistical tests”.
However, the Editorial is deeply flawed in its statistical reasoning and in this critical commentary I will show that the Editorial: (1) fails to adequately grapple with the inherent connection between “statistical inference” and “estimation” methods, (2) presents several misleading arguments about the flaws of significance tests, and (3) presents an alternative that is, in itself, a form of significance test – the minimal effects test. Finally, I end with a short list of more urgent problems that the ISPJE could work to address.
Elkins, M. R. et al. Statistical inference through estimation: recommendations from the International Society of Physiotherapy Journal Editors. Phys. Ther. 102, pzac066 (2022).
Murphy, K. R. & Myors, B. Testing the hypothesis that treatments have negligible effects: Minimum-effect tests in the general linear model. J. Appl. Psychol. 84, 234–248 (1999).
Rafi, Z. & Greenland, S. Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise. BMC Med. Res. Methodol. 20, 244 (2020).
Cohen, J. The earth is round (p < .05). Am. Psychol. 49, 997–1003 (1994).
Lakens, D. The Practical Alternative to the p Value Is the Correctly Used p Value. Perspect. Psychol. Sci. 16, 639–648 (2021).
Herbert, R. Research Note: Significance testing and hypothesis testing: meaningless, misleading and mostly unnecessary. J. Physiother. 65, 178–181 (2019).
Goodman, S. N. & Royall, R. Evidence and scientific research. Am. J. Public Health 78, 1568–1574 (1988).
Lakens, D. Why p-values are not measures of evidence. (2021). PsyRxiv.
Goodman, S. N. Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy. Ann. Intern. Med. 130, 995–1004 (1999).
Collaboration, O. S. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015).
Patil, P., Peng, R. D. & Leek, J. T. What Should Researchers Expect When They Replicate Studies? A Statistical View of Replicability in Psychological Science. Perspect. Psychol. Sci. 11, 539–544 (2016).
Scheel, A. M., Schijen, M. R. M. J. & Lakens, D. An Excess of Positive Results: Comparing the Standard Psychology Literature With Registered Reports. Adv. Methods Pract. Psychol. Sci. 4, 25152459211007468 (2021).
Ioannidis, J. P. Why most published research findings are false. PLoS Med. 2, e124 (2005).
Anderson, S. F. & Maxwell, S. E. Addressing the “Replication Crisis”: Using Original Studies to Design Replication Studies with Appropriate Statistical Power. Multivar. Behav. Res. 52, 305–324 (2017).
Nosek, B. A. et al. Replicability, robustness, and reproducibility in psychological science. Annu. Rev. Psychol. 73, 719–748 (2022).
Borg, D. N. et al. Sharing data and code: a comment on the call for the adoption of more transparent research practices in sport and exercise science. (2020).
Caldwell, A. & Vigotsky, A. D. A case against default effect sizes in sport and exercise science. PeerJ 8, e10314 (2020).
McGrath, R. E. & Meyer, G. J. When effect sizes disagree: the case of r and d. Psychol. Methods 11, 386 (2006).
Levine, T. R. & Hullett, C. R. Eta Squared, Partial Eta Squared, and Misreporting of Effect Size in Communication Research. Hum. Commun. Res. 28, 612–625 (2002).
Tenan, M. & Caldwell, A. A Critical Review of Phyiotherapy Editor’s Comments on Statistical Practice. SportRxiv.
Dabija, D. I. & Jain, N. B. Minimal Clinically Important Difference of Shoulder Outcome Measures and Diagnoses: A Systematic Review. Am. J. Phys. Med. Rehabil. 98, 671–676 (2019).
Fricker Jr, R. D., Burke, K., Han, X. & Woodall, W. H. Assessing the statistical analyses used in basic and applied social psychology after their p-value ban. Am. Stat. 73, 374–384 (2019).
Sainani, K. L. The Problem with" Magnitude-based Inference". Med. Sci. Sports Exerc. 50, 2166–2176 (2018).
Sainani, K. L., Lohse, K. R., Jones, P. R. & Vickers, A. Magnitude-based inference is not Bayesian and is not a valid method of inference. Scand. J. Med. Sci. Sports 29, 1428 (2019).
Lohse, K. R. et al. Systematic review of the use of “magnitude-based inference” in sports science and medicine. PloS One 15, e0235318 (2020).
Benjamin, D. J. et al. Redefine statistical significance. Nat. Hum. Behav. 2, 6–10 (2018).
Lakens, D. et al. Justify your alpha. Nat. Hum. Behav. 2, 168–171 (2018).
Amrhein, V. & Greenland, S. Remove, rather than redefine, statistical significance. Nat. Hum. Behav. 2, 4–4 (2018).
McShane, B. B., Gal, D., Gelman, A., Robert, C. & Tackett, J. L. Abandon statistical significance. Am. Stat. 73, 235–245 (2019).
Simmons, J. P., Nelson, L. D. & Simonsohn, U. Life after p-hacking. in Meeting of the society for personality and social psychology, New Orleans, LA 17–19 (2013).
Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. (2016).
Sun, X. et al. Credibility of claims of subgroup effects in randomised controlled trials: systematic review. Bmj 344, (2012).
Kerr, N. L. HARKing: Hypothesizing after the results are known. Personal. Soc. Psychol. Rev. 2, 196–217 (1998).
Rosenthal, R. The file drawer problem and tolerance for null results. Psychol. Bull. 86, (1979).
Borg, D. N., Lohse, K. R. & Sainani, K. L. Ten common statistical errors from all phases of research, and their fixes. PM&R 12, 610–614 (2020).
Leek, J. T. & Peng, R. D. Statistics: P values are just the tip of the iceberg. Nature 520, 612–612 (2015).
Wasserstein, R. L., Schirm, A. L. & Lazar, N. A. Moving to a world beyond “p< 0.05”. The American Statistician vol. 73 1–19 (2019).
Copyright (c) 2022 Keith Lohse
This work is licensed under a Creative Commons Attribution 4.0 International License.