This is an outdated version published on 2022-03-01. Read the most recent version.
Preprint / Version 2

Replication concerns in sports science

a narrative review of selected methodological issues in the field


  • Cristian Mesquida Centre of Applied Science for Health, Technological University Dublin
  • Jennifer Murphy Centre of Applied Science for Health, Technological University Dublin
  • Daniël Lakens Human-Technology Interaction Group, Eindhoven University of Technology
  • Joe Warne Centre of Applied Science for Health, Technological University Dublin



replicability, publication bias, statistical power, questionable research practices


Known methodological issues such as publication bias, questionable research practices (QRPs) and studies with underpowered designs are known to decrease the replicability of scientific findings. The presence of such issues has been widely established across different research fields, especially in psychology. Their presence raised the first concerns that the replicability of scientific findings could be low and led researchers to conduct large replication projects. These replication projects revealed that a significant portion of original studies could not be replicated, giving rise to the conceptualization of the Replication Crisis. Although previous research in the field of sports and exercise science has identified the first warning signs, such as an overwhelming proportion of significant findings, small sample sizes and lack of open science practices, their possible consequences for the replicability of our field have been overlooked. Furthermore, the presence of publication bias, QRPs and studies with underpowered designs, which are known to increase the number of false positives in a body literature, has yet to be examined. In this review we aim to explore the prevalence of these issues by conducting a z-curve analysis in applied studies published in the Journal of Sports Sciences. Overall, we have observed evidence of publication bias and studies with underpowered designs raising the possibility that a portion of findings in our field may not replicate. We discuss the consequences of the above issues on the replicability of our field and offer potential solutions to improve replicability.


Metrics Loading ...


Ioannidis JPA. What Have We (Not) Learnt from Millions of Scientific Papers with P Values? Am Stat. 2019; 73(1):20-25.

Fanelli D. “Positive” Results Increase Down the Hierarchy of the Sciences. PLoS One. 2010; 5(4): e10068.

Sterling TD, Rosenbaum WL, Weinkam JJ. Publication Decisions Revisited: The Effect of the Outcome of Statistical Tests on the Decision to Publish and Vice Versa. Am Stat. 1995; 49(1):108-112.

Büttner F, Toomey E, McClean S, Roe M, Delahunt E. Are questionable research practices facilitating new discoveries in sport and exercise medicine? The proportion of supported hypotheses is implausibly high. Br J Sports Med. 2020; 54(22):1365-1371.doi:10.1136/bjsports-2019-101863

Twomey R, Yingling V, Warne J, Schneider C, McCrum C, Atkins W, et al. The Nature of Our Literature: A Registered Report on the Positive Result Rate and Reporting Practices in Kinesiology. Commun Kinesiol. 2021; 1(3).

Scheel AM, Schijen MRMJ, Lakens D. An Excess of Positive Results: Comparing the Standard Psychology Literature With Registered Reports. Adv Methods Pract Psychol Sci. 2021; 4(2):1-12.

Cohen J. The statistical power of abnormal-social psychological research: a review. J Abnorm Soc Psychol. 1962;65:145-53.

Fraley RC, Vazire S. The N-Pact Factor: Evaluating the Quality of Empirical Journals with Respect to Sample Size and Statistical Power. PLoS One. 2014; 9(10): e109019.

Bakker M, van Dijk A, Wicherts JM. The Rules of the Game Called Psychological Science. Perspect Psychol Sci. 2012; 7(6):543-54.

Stanley TD, Carter EC, Doucouliagos H. What meta-analyses reveal about the replicability of psychological research. Psychol Bull. 2018; 144(12):1325-1346.

Mahoney MJ. Publication prejudices: An experimental study of confirmatory bias in the peer review system. Cogn Ther Res. 1977; 1:161–175.

Rosenthal R. The file drawer problem and tolerance for null results. Psychol Bull. 1979; 86(3), 638–641.

John LK, Loewenstein G, Prelec D. Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling. Psychol Sci. 2012; 23(5):524-532.

Simmons JP, Nelson LD, Simonsohn U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci. 2011; 22(11):1359-1366.

Kerr NL. HARKing: Hypothesizing After the Results are Known. Personal Soc Psychol Rev. 1998; 2(3):196-217.

Wicherts JM, Veldkamp CLS, Augusteijn HEM, Bakker M, van Aert RCM, van Assen MALM. Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking. Front Psychol. 2016; 7:1832. https://doi:10.3389/fpsyg.2016.01832

Gelman A, Loken E. The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. 2013. Available from:

Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013; 14(5):365-76. https://doi:10.1038/nrn3475

Maxwell SE. The Persistence of Underpowered Studies in Psychological Research: Causes, Consequences, and Remedies. Psychol Methods. 2004; 9(2):147-63.

Bishop DV. The psychology of experimental psychologists: Overcoming cognitive constraints to improve research: The 47th Sir Frederic Bartlett Lecture. Q J Exp Psychol. 2020; 73(1):1-19.

Bird A. Understanding the Replication Crisis as a Base Rate Fallacy. Br J Philos Sci. 2020; 27(4).

Oberauer K, Lewandowsky S. Addressing the theory crisis in psychology. Psychon Bull Rev. 2019; 26(5):1596-1618.

Cohen J. Statistical Power Analysis. Curr Dir Psychol Sci. 1992; 1(3):98-101. https://doi:10.1111/1467-8721.ep10768783

Bartoš F, Schimmack U. ‪Z-curve. 2.0: Estimating replication rates and discovery rates‬. 2020. https://doi://10.31234/‬‬‬‬‬

Collaboration OS. Estimating the reproducibility of psychological science. Science. 2015; 349(6251):aac4716. https://doi:10.1126/science.aac4716

Camerer CF, Dreber A, Holzmeister F, Ho T-H, Huber J, Johannesson M, et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat Hum Behav. 2018; 2(9):637-644.

Camerer CF, Dreber A, Forsell E, Ho T-H, Huber J, Johannesson M, et al. Evaluating replicability of laboratory experiments in economics. Science. 2016; 351(6280):1433-6. https://doi:10.1126/science.aaf0918

Klein RA, Ratliff KA, Vianello M, Adams RB, Bahník Š, Bernstein MJ, et al. Investigating Variation in Replicability. Soc Psychol. 2014; 45(3):142-152. htpps://

Errington TM, Mathur M, Soderberg CK, Denis A, Perfito N, Iorns E, et al. Investigating the replicability of preclinical cancer biology. Pasqualini R, Franco E, editors. eLife. 2021; 10:e71601. https://doi:10.7554/eLife.71601

Klein RA, Vianello M, Hasselman F, Adams BG, Adams RB, Alper S, et al. Many Labs 2: Investigating Variation in Replicability Across Samples and Settings. Adv Methods Pract Psychol Sci. 2018; 1(4):443-490. https://doi:10.1177/2515245918810225

Halperin I, Vigotsky AD, Foster C, Pyne DB. Strengthening the Practice of Exercise and Sport-Science Research. Int J Sports Physiol Perform. 2018; 13(2):127-134. htpps://

Errington TM, Denis A, Perfito N, Iorns E, Nosek BA. Reproducibility in cancer biology: The challenges of replication. eLife. 2021; 10:e67995.

Del Vecchio A, Casolo A, Negro F, Scorcelletti M, Bazzucchi I, Enoka R, et al. The increase in muscle force after 4 weeks of strength training is mediated by adaptations in motor unit recruitment and rate coding. J Physiol. 2019; 597(7):1873-1887. https://doi:10.1113/JP277250

Abt G, Boreham C, Davison G, Jackson R, Nevill A, Wallace E, et al. Power, precision, and sample size estimation in sport and exercise science research. J Sports Sci. 2020; 38(17):1933-1935. https://doi:10.1080/02640414.2020.1776002

Borg DN, Bon J, Sainani KL, Baguley BJ, Tierney N, Drovandi C. Sharing Data and Code: A Comment on the Call for the Adoption of More Transparent Research Practices in Sport and Exercise Science. SportRxiv; 2020. https://doi:10.31236/

Vagenas G, Palaiothodorou D, Knudson D. Thirty-year Trends of Study Design and Statistics in Applied Sports and Exercise Biomechanics Research. Int J Exerc Sci. 2018;11: 239–259.

Knudson DV. Authorship and Sampling Practice in Selected Biomechanics and Sports Science Journals. Percept Mot Skills. 2011; 112(3):838-44. https://doi:10.2466/17.PMS.112.3.838-844

Bartoš F, Schimmack U. Z-Curve 2.0: Estimating Replication Rates and Discovery Rates. PsyArXiv. 2020. https://doi:10.31234/

Brunner J, Schimmack U. Estimating Population Mean Power Under Conditions of Heterogeneity and Selection for Significance. Meta-Psychol. 2020. https://doi:10.15626/MP.2018.874

Ioannidis JPA. Why Most Published Research Findings Are False. PLoS Med. 2005; 2(8): e124. https://doi:10.1371/journal.pmed.0020124

Lakens D. Professors Are Not Elderly: Evaluating the Evidential Value of Two Social Priming Effects through P-Curve Analyses. PsyArXiv; 2017. https://doi:10.31234/

Lakens D. What p-hacking really looks like: A comment on Masicampo and LaLande (2012). Q J Exp Psychol. 2015; 68(4):829-32. https://doi:10.1080/17470218.2014.982664

Simmons JP, Simonsohn U. Power Posing: P-Curving the Evidence. Psychol Sci. 2017; 28(5):687-693. https://doi:10.1177/0956797616658563

Simonsohn U, Nelson LD, Simmons JP. P-curve: A key to the file-drawer. J Exp Psychol Gen. 2014; 143(2):534-47. https://doi:10.1037/a0033242

Hung HMJ, O’Neill RT, Bauer P, Kohne K. The Behavior of the P-Value When the Alternative Hypothesis is True. Biometrics. 1997; 53(1):11-22. https://doi:10.2307/2533093

Cumming G. Replication and p Intervals: P Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better. Perspect Psychol Sci. 2008; 3(4):286-300. https://doi:10.1111/j.1745-6924.2008.00079.x

Hartgerink CHJ, van Aert RCM, Nuijten MB, Wicherts JM, van Assen MALM. Distributions of p-values smaller than .05 in psychology: what is going on? PeerJ. 2016; 4:e1935. https://doi:10.7717/peerj.1935

Francis G. Publication bias and the failure of replication in experimental psychology. Psychon Bull Rev. 2012; 19(6):975-91. https://doi:10.3758/s13423-012-0322-y

Franco A, Malhotra N, Simonovits G. Publication bias in the social sciences: Unlocking the file drawer. Science. 2014; 345(6203):1502-5. https://doi:10.1126/science.1255484

Bartoš F. zcurve. 2021. Available from:

Maxwell SE, Delaney HD, Kelley K. Designing experiments and analyzing data: A model comparison perspective. 3rd ed. Routledge. 2017.

Murphy J, Mesquida C, Caldwell AR, Earp BD, Warne J. Selection Protocol for Replication in Sports and Exercise Science. OSF Preprints; 2021.

Schimmack U, Brunner J. Z-Curve: A Method for the Estimating Replicability Based on Test Statistics in Original Studies. 2017.

Carter EC, McCullough ME. Publication bias and the limited strength model of self-control: has the evidence for ego depletion been overestimated? Front Psychol. 2014; 5:823. https://doi:10.3389/fpsyg.2014.00823

Friese M, Frankenbach J. p-Hacking and publication bias interact to distort meta-analytic effect size estimates. Psychol Methods. 2020; 25(4):456-471. https//:doi:10.1037/met0000246

Lakens D. The Practical Alternative to the p Value Is the Correctly Used p Value. Perspect Psychol Sci J Assoc Psychol Sci. 2021; 16(3):639-648. https://doi:10.1177/1745691620958012

Miller J. What is the probability of replicating a statistically significant effect? Psychon Bull Rev. 2009; 16:pages 617–640. https://doi:10.3758/PBR.16.4.617

Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. L. Erlbaum Associates; 1988.

Fisher RA. The arrangement of field experiments. J Minist Agric. 1926; 33:503-515. https://doi:10.23637/rothamsted.8v61q

Nosek BA, Errington TM. Making sense of replications. eLife. 2017; 6:e23383. https://doi:10.7554/eLife.23383

Speed HD, Andersen MB. What exercise and sport scientists don’t understand. J Sci Med Sport. 2000; 3(1):84-92. https://doi:10.1016/S1440-2440(00)80051-1

Atkinson G, Nevill AM. Selected issues in the design and analysis of sport performance research. J Sports Sci. 2001; 19(10):811-27. https://doi:10.1080/026404101317015447

Rhea MR. Determining the magnitude of treatment effects in strength training research through the use of the effect size. J Strength Cond Res. 2004; 18(4):918-20. https://doi:10.1519/14403.1

Swinton P, Burgess K, Hall A, Greig L, Psyllas J, Aspe R, et al. A Bayesian approach to interpret intervention effectiveness in strength and conditioning Part 1: A meta-analysis to derive context-specific thresholds. SportRxiv; 2021. https://doi:10.51224/SRXIV.9

Knudson D. Confidence crisis of results in biomechanics research. Sports Biomech. 2017; 16(4):425-433. https://doi:10.1080/14763141.2016.1246603

Anderson SF, Kelley K, Maxwell SE. Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty. Psychol Sci. 2017; 28(11):1547-1562. https://doi:10.1177/0956797617723724

Maxwell SE, Kelley K, Rausch JR. Sample size planning for statistical power and accuracy in parameter estimation. Annu Rev Psychol. 2008; 59:537-63. https://doi:10.1146/annurev.psych.59.103006.093735

Halsey LG, Curran-Everett D, Vowler SL, Drummond GB. The fickle P value generates irreproducible results. Nat Methods. 2015; 12(3):179-85. https://doi:10.1038/nmeth.3288

Kvarven A, Strømland E, Johannesson M. Comparing meta-analyses and preregistered multiple-laboratory replication projects. Nat Hum Behav. 2020;4: 423–434. https://doi:10.1038/s41562-019-0787-z

Albers C, Lakens D. When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. J Exp Soc Psychol. 2018; 74, 187–195. https://doi:10.1016/j.jesp.2017.09.004

Simonsohn U, Nelson LD, Simmons JP. p-Curve and Effect Size: Correcting for Publication Bias Using Only Significant Results. Perspect Psychol Sci J Assoc Psychol Sci. 2014; 9(6):666–681. https://doi:10.1177/1745691614553988

Anvari F, Lakens D. Using anchor-based methods to determine the smallest effect size of interest. J Exp Soc Psychol. 2021; 96. https://doi:10.1016/j.jesp.2021.104159

Asendorpf JB, Conner M, Fruyt FD, Houwer JD, Denissen JJA, Fiedler K, et al. Recommendations for Increasing Replicability in Psychology. Eur J Personal. 2013; 27(2), 108–119. https://doi:

Brand A, Bradley MT. The Precision of Effect Size Estimation From Published Psychological Research: Surveying Confidence Intervals. Psychol Rep. 2016; 118(1):154-170. https://doi:10.1177/0033294115625265

Collins E, Watt R. Using and Understanding Power in Psychological Research: A Survey Study. Collabra Psychol. 2021; 7 (1): 28250. https://doi:10.1525/collabra.28250

Lakens D. Sample size justification. PsyArXiv. 2021. https://doi:10.31234/

Bakker M, Hartgerink CHJ, Wicherts JM, van der Maas HLJ. Researchers’ Intuitions About Power in Psychological Research. Psychol Sci. 2016; 27(8):1069-77. https://doi:10.1177/0956797616647519

Bakker M, Wicherts JM. The (mis)reporting of statistical results in psychology journals. Behav Res Methods. 2011; 43(3):666-78. https://doi:10.3758/s13428-011-0089-5

Nuijten MB, Hartgerink CHJ, van Assen MALM, Epskamp S, Wicherts JM. The prevalence of statistical reporting errors in psychology (1985–2013). Behav Res Methods. 2016; 48:1205-1225. https://doi:10.3758/s13428-015-0664-2

Wicherts JM, Bakker M, Molenaar D. Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results. PLoS One. 2011; 6(11): e26828. https://doi:10.1371/journal.pone.0026828

Gopalakrishna G, Riet G ter, Vink G, Stoop I, Wicherts JM, Bouter LM. Prevalence of questionable research practices, research misconduct and their potential explanatory factors: A survey among academic researchers in The Netherlands. PLoS One. 2022;17:e0263023. https:doi:10.1371/journal.pone.0263023

Fanelli D. How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data. PLoS One. 2009; 4(5):e5738. https://doi:10.1371/journal.pone.0005738

Cumming G. Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis. Routledge; 2013.

Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016; 31(4):337-50. https://doi:10.1007/s10654-016-0149-3

Motulsky HJ. Common misconceptions about data analysis and statistics. Pharmacol Res Perspect. 2014; 387(11):1017-23. https://doi:10.1002/prp2.93

Sullivan GM, Feinn R. Using Effect Size—or Why the P Value Is Not Enough. J Grad Med Educ. 2012; 4(3):279-82. https://doi:10.4300/JGME-D-12-00156.1

Lakens D, Hilgard J, Staaks J. On the reproducibility of meta-analyses: Six practical recommendations. BMC Psychol. 2016; 4(1):24. https://doi:10.1186/s40359-016-0126-3

Lakens D. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Front Psychol. 2013; 4:863. https://doi:10.3389/fpsyg.2013.00863

DeBruine L. Within-subject t-test forensics. 2021. Available from:

Faul F, Erdfelder E, Buchner A, Lang A-G. Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behav Res Methods. 2009; 41(4):1149-60. https://doi:10.3758/BRM.41.4.1149

Chavalarias D, Wallach JD, Li AHT, Ioannidis JPA. Evolution of Reporting P Values in the Biomedical Literature, 1990-2015. JAMA. 2016; 315(11):1141-8. https://doi:10.1001/jama.2016.1952

Szucs D, Ioannidis JPA. Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biol. 2017; 19(3): e3001151. https://doi:10.1371/journal.pbio.2000797

Wilkinson MD, Dumontier M, Aalbersberg IjJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016; 3:160018. https://doi:10.1038/sdata.2016.18

Munafò MR, Nosek BA, Bishop DVM, Button KS, Chambers CD, Percie du Sert N, et al. A manifesto for reproducible science. Nat Hum Behav. 2017; 1:0021. https://doi:10.1038/s41562-016-0021

Smaldino PE, McElreath R. The natural selection of bad science. R Soc Open Sci. 2016; 3(9):160384. https://doi:10.1098/rsos.160384

Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD, Breckler SJ, et al. Promoting an open research culture: Author guidelines for journals could help to promote transparency, openness, and reproducibility. Science. 2015; 348(6242):1422-1425. https://doi:10.1126/science.aab2374

Caldwell AR, Vigotsky AD, Tenan MS, Radel R, Mellor DT, Kreutzer A, et al. Moving Sport and Exercise Science Forward: A Call for the Adoption of More Transparent Research Practices. Sports Med. 2020; 50(3):449-459. https://doi:10.1007/s40279-019-01227-1

Schäfer T, Schwarz MA. The Meaningfulness of Effect Sizes in Psychological Research: Differences Between Sub-Disciplines and the Impact of Potential Biases. Front Psychol. 2019; 10:813. https://doi:10.3389/fpsyg.2019.00813

Goldcare B, Drysdale H, Powell-Smith A, Dale A, Milosevic I, Slade E, et al. The COMPare Trials Project. 2016.

Rasmussen N, Lee K, Bero L. Association of trial registration with the results and conclusions of published trials of new oncology drugs. Trials. 2009; 10:116. https:doi:10.1186/1745-6215-10-116

Nosek BA, Lakens D. Registered reports: A method to increase the credibility of published results. Soc Psychol. 2014; 45(3):137-141. https://doi:10.1027/1864-9335/a000192

Allen C, Mehler DMA. Open science challenges, benefits and tips in early career and beyond. PLoS Biol. 2019; 7(12):e3000587. https://doi:10.1371/journal.pbio.3000246

Impellizzeri FM, McCall A, Meyer T. Registered reports coming soon: our contribution to better science in football research. Sci Med Footb. 2019; 3(2):87-88. https://doi:10.1080/24733938.2019.1603659

Abt G, Boreham C, Davison G, Jackson R, Wallace E, Williams AM. Registered reports in the journal of sports sciences. J Sports Sci. 2021; 39(16):1789-1790. https://doi:10.1080/02640414.2021.1950974

Field SM, Hoekstra R, Bringmann L, van Ravenzwaaij D. When and Why to Replicate: As Easy as 1, 2, 3? Collabra Psychol. 2019; 5(1): 46. https://doi:10.1525/collabra.218

Isager PM, van Aert RCM, Bahník Š, Brandt MJ, DeSoto KA, Giner-Sorolla R, et al. Deciding what to replicate: A decision model for replication study selection under resource and knowledge constraints. Psychol Methods. 2021. https://doi:10.1037/met0000438

Coles NA, Tiokhin L, Scheel AM, Isager PM, Lakens D. The costs and benefits of replication studies. Behav Brain Sci. 2018; 41:e124. https://doi:10.1017/S0140525X18000596

Nosek BA, Hardwicke TE, Moshontz H, Allard A, Corker KS, Dreber A, et al. Replicability, Robustness, and Reproducibility in Psychological Science. Annu Rev Psychol. 2022; 73(1):719-748. https://doi:10.1146/annurev-psych-020821-114157

Nosek BA, Errington TM. What is replication? PLoS Biol. 2020; 18(3):e3000691. https://doi:10.1371/journal.pbio.3000691

Brunner J, Schimmack U. How replicable is psychology? A comparison of four methods of estimating replicability on the basis of test statistics in original studies. 2016. Available from:



2022-02-23 — Updated on 2022-03-01