On the replicability of sports and exercise science research
assessing the prevalence of publication bias and studies with underpowered designs by a z-curve analysis
DOI:
https://doi.org/10.51224/SRXIV.534Keywords:
publication bias, statistical power, replicabilityAbstract
The sports science replication project has raised concerns about the replicability of published research. Low
replication rates can have several causes. One possible cause is an excess of significant results caused by
publication bias, where selection for significance inflates the proportion of significant findings in the literature,
while the statistical power to detect effects is substantially lower. To date, no study has systematically assessed
the average statistical power of research in the field. One method to assess publication bias and average statistical
power is the z-curve method. In this study, we manually extracted 350 independent p-values corresponding to the
hypothesis tested in 350 studies published across 10 applied sports and exercise science journals. After exclusions,
a z-curve analysis was performed on 269 independent p-values. The estimate of the Observed Discovery Rate
(0.68) is larger than the upper bound of the 95% confidence intervals (CI) of the Expected Discovery Rate of
[0.05; 0.33] indicating strong publication bias in the literature. The average statistical power is 11% [0.05; 0.33],
and only 29% of studies are estimated to have been designed with high power. The Expected Replication Rate
was 0.49 95% CI [0.36; 0.61], indicating that only 49% of direct replications with the same sample size should be
expected to replicate. Publication bias, combined with low average statistical power, is likely to result in a body
of literature characterized by inflated effect sizes, a high proportion of type I and type II errors, and therefore low
replicability. Addressing these issues requires a collective effort to build a more informative and reliable
knowledge base.
Metrics
References
Abt, G., Boreham, C., Davison, G., Jackson, R., Nevill, A., Wallace, E., & Williams, M. (2020). Power, precision,
and sample size estimation in sport and exercise science research. Journal of Sports Sciences, 38(17),
–1935. https://doi.org/10.1080/02640414.2020.1776002
Abt, G., Boreham, C., Davison, G., Jackson, R., Wallace, E., & Williams, A. M. (2021). Registered reports in the
Journal of Sports Sciences. 39(16), 1789–1790. https://doi.org/10.1080/02640414.2021.1950974
Bakker, M., van Dijk, A., & Wicherts, J. M. (2012). The Rules of the Game Called Psychological Science.
Perspectives on Psychological Science, 7(6), 543–554. https://doi.org/10.1177/1745691612459060
Bartoš, F., & Schimmack, U. (2020). Z-Curve 2.0: Estimating Replication Rates and Discovery Rates. In
PsyArXiv. https://doi.org/10.31234/osf.io/urgtn
Bland, J. M., & Altman, D. G. (2011). Comparisons against baseline within randomised groups are often used and
can be highly misleading. Trials, 12, 264. https://doi.org/10.1186/1745-6215-12-264
Borg, D. N., Barnett, A. G., Caldwell, A. R., White, N. M., & Stewart, I. B. (2023). The bias for statistical
significance in sport and exercise medicine. Journal of Science and Medicine in Sport, 26(3), 164–168.
https://doi.org/10.1016/j.jsams.2023.03.002
Brunner, J., & Schimmack, U. (2020). Estimating Population Mean Power Under Conditions of Heterogeneity
and Selection for Significance. Meta-Psychology. https://doi.org/10.15626/MP.2018.874
Büttner, F., Toomey, E., McClean, S., Roe, M., & Delahunt, E. (2020). Are questionable research practices
facilitating new discoveries in sport and exercise medicine? The proportion of supported hypotheses is
implausibly high. British Journal of Sports Medicine, 54(22), 1365–1371.
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013).
Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews
Neuroscience, 14(5), 365–376. https://doi.org/10.1038/nrn3475
Chambers, C. D., & Tzavella, L. (2021). The past, present and future of Registered Reports. Nature Human
Behaviour, 1–14. https://doi.org/10.1038/s41562-021-01193-7
Colquhoun, D. (2014). An investigation of the false discovery rate and the misinterpretation of p-values. Royal
Society Open Science, 1(3), 140216. https://doi.org/10.1098/rsos.140216
Curran, P. J. (2009). The seemingly quixotic pursuit of a cumulative psychological science: Introduction to the
special issue. Psychological Methods, 14(2), 77–80. https://doi.org/10.1037/a0015972
Errington, T. M., Mathur, M., Soderberg, C. K., Denis, A., Perfito, N., Iorns, E., & Nosek, B. A. (2021).
Investigating the replicability of preclinical cancer biology. eLife, 10, e71601.
https://doi.org/10.7554/eLife.71601
Impellizzeri, F. M., McCall, A., & Meyer, T. (2019). Registered reports coming soon: Our contribution to better
science in football research. Science and Medicine in Football, 3(2), 87–88.
https://doi.org/10.1080/24733938.2019.1603659
Lakens, D. (2022). Sample Size Justification. Collabra: Psychology, 8(1), 33267.
https://doi.org/10.1525/collabra.33267
Lakens, D., Mesquida, C., Rasti, S., & Ditroilo, M. (2024). The benefits of preregistration and Registered Reports.
Evidence-Based Toxicology, 2(1), 2376046. https://doi.org/10.1080/2833373X.2024.2376046
Mahoney, M. J. (1977). Publication prejudices: An experimental study of confirmatory bias in the peer review
system. Cognitive Therapy and Research, 1, 161–175. https://doi.org/10.1007/BF01173636
Maxwell, S. E. (2004). The Persistence of Underpowered Studies in Psychological Research: Causes,
Consequences, and Remedies. Psychological Methods, 9(2), 147–163. https://doi.org/10.1037/1082-
X.9.2.147
Mesquida, C., Murphy, J., Lakens, D., & Warne, J. (2022). Replication concerns in sports science: A narrative
review of selected methodological issues in the field. SportRxiv. https://doi.org/10.51224/SRXIV.127
Mesquida, C., Murphy, J., Lakens, D., & Warne, J. (2023). Publication bias, statistical power and reporting
practices in the Journal of Sports Sciences: Potential barriers to replicability. Journal of Sports Sciences,
(16), 1507–1517. https://doi.org/10.1080/02640414.2023.2269357
Murphy, J., Mesquida, C., Caldwell, A. R., Earp, B. D., & Warne, J. P. (2022). Proposal of a Selection Protocol
for Replication of Studies in Sports and Exercise Science. Sports Medicine (Auckland, N.Z.).
https://doi.org/10.1007/s40279-022-01749-1
Murphy, J., Warne, J., Mesquida, C., & Caldwell, A. R. (2024). Sports Science Replication Centre. OSF.
https://doi.org/10.17605/OSF.IO/3VUFG
Nosek, B. A., & Lakens, D. (2014). Registered reports: A method to increase the credibility of published results.
Social Psychology, 45(3). https://doi.org/10.1027/1864-9335/a000192
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251),
aac4716. https://doi.org/10.1126/science.aac4716
Quintana, D. S. (2020). Most oxytocin administration studies are statistically underpowered to reliably detect (or
reject) a wide range of effect sizes. Comprehensive Psychoneuroendocrinology, 4, 100014.
https://doi.org/10.1016/j.cpnec.2020.100014
Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 83(3), 638–
https://doi.org/10.1037/0033-2909.86.3.638
Scheel, A. M., Schijen, M. R. M. J., & Lakens, D. (2021). An Excess of Positive Results: Comparing the Standard
Psychology Literature With Registered Reports. Advances in Methods and Practices in Psychological
Science, 4(2), 1–12. https://doi.org/10.1177/25152459211007467
Stefan, A. M., & Schönbrodt, F. D. (2023). Big little lies: A compendium and simulation of p-hacking strategies.
In Royal Society Open Science (Vol. 10, Issue 2, p. 220346). Royal Society.
https://doi.org/10.1098/rsos.220346
Sterling, T. D., Rosenbaum, W. L., & Weinkam, J. J. (1995). Publication Decisions Revisited: The Effect of the
Outcome of Statistical Tests on the Decision to Publish and Vice Versa. The American Statistician, 49(1),
–112. https://doi.org/10.2307/2684823
Szucs, D., & Ioannidis, J. P. A. (2017). Empirical assessment of published effect sizes and power in the recent
cognitive neuroscience and psychology literature. PLoS Biology, 19(3), e3001151.
https://doi.org/10.1371/journal.pbio.2000797
Twomey, R., Yingling, V., Warne, J., Schneider, C., McCrum, C., Atkins, W., Murphy, J., Medina, C. R., Harlley,
S., & Caldwell, A. (2021). The Nature of Our Literature: A Registered Report on the Positive Result
Rate and Reporting Practices in Kinesiology. Communications in Kinesiology, 1(3), 1–17.
https://doi.org/10.51224/cik.v1i3.43
Wicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M., van Aert, R. C. M., & van Assen, M. A.
L. M. (2016). Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological
Studies: A Checklist to Avoid p-Hacking. Frontiers in Psychology, 7, 1832.
https://doi.org/10.3389/fpsyg.2016.01832
Wilson, B. M., & Wixted, J. T. (2018). The Prior Odds of Testing a True Effect in Cognitive and Social
Psychology. Advances in Methods and Practices in Psychological Science, 1(2), 186–197.
Downloads
Posted
Categories
License
Copyright (c) 2025 Cristian Mesquida, Jennifer Murphy, Joe Warne, Daniël Lakens (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.