Preprint / Version 1

Intrinsic Judgment Error in Men’s Championship World Surf League

WSL 2021




Judgment, Reliability, Validity, Surf, WSL


Surfers’ performances are subjectively ranked by 5 judges. Low reliability and validity in
judgment may lead to preventable errors and unfair scores. Aiming to describe the judgment
error we analyzed the available WSL’s data related to 2021 Men’s Championship Tour (4,095
waves; 20,475 scores). We found an inverted ‘U’-shape pattern for the judgment error vs.
control score, explained by a quadratic regression model (R = 0.52; SEE = 0.10). The reliability
produced excellent Intraclass Correlation Coeficient (CI95% = 0.97, 1.00), with a between judge
(typical) error of 0.15. Validity analyses indicated a minimal real difference of 0.49 in the sum of
two waves between the surfers for having 95% certainty for the heat winner. We recommend
WSL to incorporate the intrinsic judgment error in into judgments for increasing the fairness
and trust on WSL championship tour.


World Surf League. (2022). Rules and regulations. Available:

D. Kahneman and S. Frederick, "Representativeness revisited: Attribute substitution in

intuitive judgment," in Heuristics and biases: The psychology of intuitive judgment., ed New

York, NY, US: Cambridge University Press, 2002, pp. 49-81.

C. Staats, K. Capatosto, R. A. Wright, and D. Contractor, State of the science: Implicit bias

review 2015 vol. 3: Kirwan Institute for the Study of Race and Ethnicity Columbus, OH,

S. Heiniger and H. Mercier. Judging the judges: A general framework for evaluating the

performance of international sports judges [Online]. Available:

K. Flessas, D. Mylonas, G. Panagiotaropoulou, D. Tsopani, A. Korda, C. Siettos, et al.,

"Judging the judges' performance in rhythmic gymnastics," Medicine and Science in Sports

and Exercise, vol. 47, pp. 640-8, Mar 2015.

H. Hill and S. Windmann, "Examining Event-Related Potential (ERP) correlates of

decision bias in recognition memory judgments," PLoS One, vol. 9, p. e106411, 2014.

H. Plessner and T. Haar, "Sports performance judgments from a social cognitive

perspective," Journal Psychology of Sport Exercise, vol. 7, pp. 555-575, 2006.

M. Muraven and R. F. Baumeister, "Self-regulation and depletion of limited resources:

does self-control resemble a muscle?," Psychol Bull, vol. 126, pp. 247-59, Mar 2000.

C. Leandro, L. Avila-Carvalho, E. Sierra-Palmeiro, and M. Bobo-Arce, "Judging in Rhythmic

Gymnastics at Different Levels of Performance," Journal of Human Kinetics vol. 60, pp.

-165, Dec 2017.

M. B. Pajek, I. Cuk, J. Pajek, M. Kovač, and B. Leskošek, "Is the quality of judging in

women artistic gymnastics equivalent at major competitions of different levels?," Journal

of Human Kinetics, vol. 37, pp. 173-81, 2013.

J. Premelč, G. Vučković, N. James, and B. Leskošek, "Reliability of Judging in DanceSport,"

Front Psychol, vol. 10, p. 1001, 2019.

B. Leskošek, I. Čuk, I. Karácsony, J. Pajek, and M. Bučar, "Reliability and validity of judging

in men's artistic gymnastics at the 2009 university games," Science of Gymnastics Journal,

vol. 2, pp. 25-34, 2010.

S. Heiniger and H. Mercier, "Judging the judges: evaluating the accuracy and national

bias of international gymnastics judges," Journal of Quantitative Analysis in Sports, vol. 17,

pp. 289-305, 2021.

W. G. Hopkins, "Measures of reliability in sports medicine and science," Sports Medicine,

vol. 30, pp. 1-15, Jul 2000.

T. K. Koo and M. Y. Li, "A Guideline of Selecting and Reporting Intraclass Correlation

Coefficients for Reliability Research," Journal of Chiropractic Medicine, vol. 15, pp. 155-63,

Jun 2016.

R Core Team. R: A language and environment for statistical computing. R Foundation for

Statistical Computing. [Online]. Available:

W. Revelle. Procedures for personality and psychological research [Online]. Available:

A. G. Copay, B. R. Subach, S. D. Glassman, D. W. Polly, Jr., and T. C. Schuler,

"Understanding the minimum clinically important difference: a review of concepts and

methods," Spine Journal, vol. 7, pp. 541-6, Sep-Oct 2007.

G. Atkinson and A. M. Nevill, "Statistical methods for assessing measurement error

(reliability) in variables relevant to sports medicine," Sports Medicine, vol. 26, pp. 217-38,

Oct 1998.

T. H. Lyngstad, J. Härkönen, and L. T. S. Rønneberg, "Nationalistic bias in sport

performance evaluations: An example from the ski jumping world cup," European

Journal for Sport and Society, vol. 17, pp. 250-264, 2020/07/02 2020.