On the replicability of sports and exercise science research: assessing the prevalence of publication bias and studies with underpowered designs by a z-curve analysis

Cristian Mesquida; Jennifer Murphy; Joe Warne; Daniël Lakens

doi:10.51224/SRXIV.534

##article.authors##

Cristian Mesquida Eindhoven University of Technology https://orcid.org/0000-0002-1542-8355
Jennifer Murphy Atlantic Technological University https://orcid.org/0000-0001-8624-3828
Joe Warne Technological University Dublin
Daniël Lakens Eindhoven University of Technology

DOI:

https://doi.org/10.51224/SRXIV.534

Keywords:

publication bias, statistical power, replicability

Abstract

The sports science replication project has raised concerns about the replicability of published research. Low

replication rates can have several causes. One possible cause is an excess of significant results caused by

publication bias, where selection for significance inflates the proportion of significant findings in the literature,

while the statistical power to detect effects is substantially lower. To date, no study has systematically assessed

the average statistical power of research in the field. One method to assess publication bias and average statistical

power is the z-curve method. In this study, we manually extracted 350 independent p-values corresponding to the

hypothesis tested in 350 studies published across 10 applied sports and exercise science journals. After exclusions,

a z-curve analysis was performed on 269 independent p-values. The estimate of the Observed Discovery Rate

(0.68) is larger than the upper bound of the 95% confidence intervals (CI) of the Expected Discovery Rate of

[0.05; 0.33] indicating strong publication bias in the literature. The average statistical power is 11% [0.05; 0.33],

and only 29% of studies are estimated to have been designed with high power. The Expected Replication Rate

was 0.49 95% CI [0.36; 0.61], indicating that only 49% of direct replications with the same sample size should be

expected to replicate. Publication bias, combined with low average statistical power, is likely to result in a body

of literature characterized by inflated effect sizes, a high proportion of type I and type II errors, and therefore low

replicability. Addressing these issues requires a collective effort to build a more informative and reliable

knowledge base.

Metrics

Metrics Loading ...

References

Abt, G., Boreham, C., Davison, G., Jackson, R., Nevill, A., Wallace, E., & Williams, M. (2020). Power, precision,

and sample size estimation in sport and exercise science research. Journal of Sports Sciences, 38(17),

–1935. https://doi.org/10.1080/02640414.2020.1776002

Abt, G., Boreham, C., Davison, G., Jackson, R., Wallace, E., & Williams, A. M. (2021). Registered reports in the

Journal of Sports Sciences. 39(16), 1789–1790. https://doi.org/10.1080/02640414.2021.1950974

Bakker, M., van Dijk, A., & Wicherts, J. M. (2012). The Rules of the Game Called Psychological Science.

Perspectives on Psychological Science, 7(6), 543–554. https://doi.org/10.1177/1745691612459060

Bartoš, F., & Schimmack, U. (2020). Z-Curve 2.0: Estimating Replication Rates and Discovery Rates. In

PsyArXiv. https://doi.org/10.31234/osf.io/urgtn

Bland, J. M., & Altman, D. G. (2011). Comparisons against baseline within randomised groups are often used and

can be highly misleading. Trials, 12, 264. https://doi.org/10.1186/1745-6215-12-264

Borg, D. N., Barnett, A. G., Caldwell, A. R., White, N. M., & Stewart, I. B. (2023). The bias for statistical

significance in sport and exercise medicine. Journal of Science and Medicine in Sport, 26(3), 164–168.

https://doi.org/10.1016/j.jsams.2023.03.002

Brunner, J., & Schimmack, U. (2020). Estimating Population Mean Power Under Conditions of Heterogeneity

and Selection for Significance. Meta-Psychology. https://doi.org/10.15626/MP.2018.874

Büttner, F., Toomey, E., McClean, S., Roe, M., & Delahunt, E. (2020). Are questionable research practices

facilitating new discoveries in sport and exercise medicine? The proportion of supported hypotheses is

implausibly high. British Journal of Sports Medicine, 54(22), 1365–1371.

Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013).

Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews

Neuroscience, 14(5), 365–376. https://doi.org/10.1038/nrn3475

Chambers, C. D., & Tzavella, L. (2021). The past, present and future of Registered Reports. Nature Human

Behaviour, 1–14. https://doi.org/10.1038/s41562-021-01193-7

Colquhoun, D. (2014). An investigation of the false discovery rate and the misinterpretation of p-values. Royal

Society Open Science, 1(3), 140216. https://doi.org/10.1098/rsos.140216

Curran, P. J. (2009). The seemingly quixotic pursuit of a cumulative psychological science: Introduction to the

special issue. Psychological Methods, 14(2), 77–80. https://doi.org/10.1037/a0015972

Errington, T. M., Mathur, M., Soderberg, C. K., Denis, A., Perfito, N., Iorns, E., & Nosek, B. A. (2021).

Investigating the replicability of preclinical cancer biology. eLife, 10, e71601.

https://doi.org/10.7554/eLife.71601

Impellizzeri, F. M., McCall, A., & Meyer, T. (2019). Registered reports coming soon: Our contribution to better

science in football research. Science and Medicine in Football, 3(2), 87–88.

https://doi.org/10.1080/24733938.2019.1603659

Lakens, D. (2022). Sample Size Justification. Collabra: Psychology, 8(1), 33267.

https://doi.org/10.1525/collabra.33267

Lakens, D., Mesquida, C., Rasti, S., & Ditroilo, M. (2024). The benefits of preregistration and Registered Reports.

Evidence-Based Toxicology, 2(1), 2376046. https://doi.org/10.1080/2833373X.2024.2376046

Mahoney, M. J. (1977). Publication prejudices: An experimental study of confirmatory bias in the peer review

system. Cognitive Therapy and Research, 1, 161–175. https://doi.org/10.1007/BF01173636

Maxwell, S. E. (2004). The Persistence of Underpowered Studies in Psychological Research: Causes,

Consequences, and Remedies. Psychological Methods, 9(2), 147–163. https://doi.org/10.1037/1082-

X.9.2.147

Mesquida, C., Murphy, J., Lakens, D., & Warne, J. (2022). Replication concerns in sports science: A narrative

review of selected methodological issues in the field. SportRxiv. https://doi.org/10.51224/SRXIV.127

Mesquida, C., Murphy, J., Lakens, D., & Warne, J. (2023). Publication bias, statistical power and reporting

practices in the Journal of Sports Sciences: Potential barriers to replicability. Journal of Sports Sciences,

(16), 1507–1517. https://doi.org/10.1080/02640414.2023.2269357

Murphy, J., Mesquida, C., Caldwell, A. R., Earp, B. D., & Warne, J. P. (2022). Proposal of a Selection Protocol

for Replication of Studies in Sports and Exercise Science. Sports Medicine (Auckland, N.Z.).

https://doi.org/10.1007/s40279-022-01749-1

Murphy, J., Warne, J., Mesquida, C., & Caldwell, A. R. (2024). Sports Science Replication Centre. OSF.

https://doi.org/10.17605/OSF.IO/3VUFG

Nosek, B. A., & Lakens, D. (2014). Registered reports: A method to increase the credibility of published results.

Social Psychology, 45(3). https://doi.org/10.1027/1864-9335/a000192

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251),

aac4716. https://doi.org/10.1126/science.aac4716

Quintana, D. S. (2020). Most oxytocin administration studies are statistically underpowered to reliably detect (or

reject) a wide range of effect sizes. Comprehensive Psychoneuroendocrinology, 4, 100014.

https://doi.org/10.1016/j.cpnec.2020.100014

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 83(3), 638–

https://doi.org/10.1037/0033-2909.86.3.638

Scheel, A. M., Schijen, M. R. M. J., & Lakens, D. (2021). An Excess of Positive Results: Comparing the Standard

Psychology Literature With Registered Reports. Advances in Methods and Practices in Psychological

Science, 4(2), 1–12. https://doi.org/10.1177/25152459211007467

Stefan, A. M., & Schönbrodt, F. D. (2023). Big little lies: A compendium and simulation of p-hacking strategies.

In Royal Society Open Science (Vol. 10, Issue 2, p. 220346). Royal Society.

https://doi.org/10.1098/rsos.220346

Sterling, T. D., Rosenbaum, W. L., & Weinkam, J. J. (1995). Publication Decisions Revisited: The Effect of the

Outcome of Statistical Tests on the Decision to Publish and Vice Versa. The American Statistician, 49(1),

–112. https://doi.org/10.2307/2684823

Szucs, D., & Ioannidis, J. P. A. (2017). Empirical assessment of published effect sizes and power in the recent

cognitive neuroscience and psychology literature. PLoS Biology, 19(3), e3001151.

https://doi.org/10.1371/journal.pbio.2000797

Twomey, R., Yingling, V., Warne, J., Schneider, C., McCrum, C., Atkins, W., Murphy, J., Medina, C. R., Harlley,

S., & Caldwell, A. (2021). The Nature of Our Literature: A Registered Report on the Positive Result

Rate and Reporting Practices in Kinesiology. Communications in Kinesiology, 1(3), 1–17.

https://doi.org/10.51224/cik.v1i3.43

Wicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M., van Aert, R. C. M., & van Assen, M. A.

L. M. (2016). Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological

Studies: A Checklist to Avoid p-Hacking. Frontiers in Psychology, 7, 1832.

https://doi.org/10.3389/fpsyg.2016.01832

Wilson, B. M., & Wixted, J. T. (2018). The Prior Odds of Testing a True Effect in Cognitive and Social

Psychology. Advances in Methods and Practices in Psychological Science, 1(2), 186–197.

https://doi.org/10.1177/2515245918767122

On the replicability of sports and exercise science research

assessing the prevalence of publication bias and studies with underpowered designs by a z-curve analysis

##article.authors##

DOI:

Keywords:

Abstract

Metrics

References

Downloads

Posted

Categories

License

Developed By