Preprint / Version 2

Unlocking Rise-Fall-Peak in Tennis

Evaluating XGBoost, TabNet and other models in Predicting the post bounce peak ball height

##article.authors##

  • Shane Liyanage Data Driven Sports Analytics

DOI:

https://doi.org/10.51224/SRXIV.230

Keywords:

tennis, Neural Network, XG Boost, TabNet, Tennis On the Rise, Post Bounce Tennis, Peak Height Tennis, Tennis Analytics

Abstract

Tennis is an open skill sport where players often have a range of choices on shot selection (Wang, Chang, 2013). One potential choice relates to how early or late a player decides to (or is forced to) hit the ball, often these decisions are referred to as ‘taking the ball on the rise’, ‘hitting the ball at the peak of the bounce’ or ‘letting the ball drop’. Tennis coaches and commentators often speculate about the advantage or disadvantage of each option for a player from a strategic perspective.

An inhibition to validating these claims is due to data not readily being available on how early or late a player takes the ball in relation to the post bounce peak ball height. Hawk-eye datasets provide the data point for post bounce peak ball height, but when the player makes contact with the ball early, the value of the post bounce peak ball height is the same as where the player makes contact with the racquet, hence inhibiting the accurate calculation of how early a player makes contact with the ball.

In this paper, various models were trained on Hawk-eye data and evaluated in order to determine the best model to predict post bounce ball location. Two separate approaches were examined, the first a multi-output regressor prediction (MORP) where the models would predict the location (x,y,z) as one prediction, the second approach a separate regression prediction predicted the post bounce peak horizonal, vertical and lateral (x,y,z) separately. It is hoped that one or both approaches with sufficient model performance can unlock the ability to calculate balls taken on the rise, at the peak or on the drop.

A range of models starting from simple linear regression models to more complex Neural Networks like TabNet and Advanced Tree based models like XGBooost were examined in this paper.

We found that the XG Boot model performed the best under the MORP approach when considering the RMSE, R2 and Training time for the model, whilst XG Boost also performed better under the SRP approach in predicting the x and y axis, and was equal in performance with TabNet in predicting the z axis.

The MORP approach is recommended if you want to calculate both the vertical and horizonal difference from post bounce peak height, where as the SRP approach may be preferred to calculate only the vertical height difference.

Metrics

Metrics Loading ...

References

Wang C-H, Chang C-C, Liang Y-M, Shih C-M, Chiu W-S, Tseng P, et al. (2013) Open vs. Closed Skill Sports and the Modulation of Inhibitory Control. PLoS ONE 8(2): e55773. https://doi.org/10.1371/journal.pone.0055773

Schneider, A., Hommel, G., & Blettner, M. (2010). Linear regression analysis: part 14 of a series on evaluation of scientific publications. Deutsches Arzteblatt international, 107(44), 776–782. https://doi.org/10.3238/arztebl.2010.0776

Vidaurre, D., Bielza, C., & Larrañaga, P. (2013). A Survey of L₁ Regression. International Statistical Review / Revue Internationale de Statistique, 81(3), 361–387. http://www.jstor.org/stable/43299642

Cortes, C., Mohri, M., & Rostamizadeh, A. (2009). L2 Regularization for Learning Kernels. Conference on Uncertainty in Artificial Intelligence.

Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32. http://dx.doi.org/10.1023/A:1010933404324

Abolfazl Nadi, Hadi Moradi, Increasing the views and reducing the depth in random forest (2019), Expert Systems with Applications, Volume 138,2019, 12801,ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2019.07.018. (https://www.sciencedirect.com/science/article/pii/S095741741930497X)

Ramchoun, H., Idrissi, M.A., Ghanou, Y., & Ettaouil, M. (2016). Multilayer Perceptron: Architecture Optimization and Training. Int. J. Interact. Multim. Artif. Intell., 4, 26-30.

Byrd, J., & Lipton, Z. (2019, May). What is the effect of importance weighting in deep learning?. In International Conference on Machine Learning (pp. 872-881). PMLR.

Probst, Philipp & Wright, Marvin & Boulesteix, Anne-Laure. (2019). Hyperparameters and Tuning Strategies for Random Forest. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 9. 10.1002/widm.1301.

Arik, S. Ö., & Pfister, T. (2021). TabNet: Attentive Interpretable Tabular Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(8), 6679-6687. https://doi.org/10.1609/aaai.v35i8.16826

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. NIPS.

Dorogush, A.V., Ershov, V., & Gulin, A. (2018). CatBoost: gradient boosting with categorical features support. ArXiv, abs/1810.11363.

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). ACM

Downloads

Posted

2022-12-11 — Updated on 2022-12-13

Versions