Model Selection for Count Data with Excess Number of Zero Counts

K.M. Sakthivel; C.S. Rajitha

American Journal of Applied Mathematics and Statistics. 2019, 7(1), 43-51
DOI: 10.12691/AJAMS-7-1-7

Original Research

Model Selection for Count Data with Excess Number of Zero Counts

K.M. Sakthivel^1, and C.S. Rajitha¹

¹Department of Statistics, Bharathiar University,Coimbatore-641046, Tamilnadu, India

Pub. Date: January 15, 2019

Full Text PDF

Cite this paper

K.M. Sakthivel and C.S. Rajitha. Model Selection for Count Data with Excess Number of Zero Counts. American Journal of Applied Mathematics and Statistics. 2019; 7(1):43-51. doi: 10.12691/AJAMS-7-1-7

Abstract

Zero inflated models have been widely studied in statistical literature. Zero inflated Poisson model and hurdle model are the most commonly used models for modeling the overdispersed count data. In adddition to this, recent studies shows that a nonparametric and data dependent technique known as artificial neural networks (ANN) produce better performance for modeling the over dispersed and zero inflated count data. In this paper, we compared the performance of different models such as zero inflated Poisson model, hurdle model and ANN for modelling the zero inflated count data in terms of standardized MSE, SE, bias and relative efficiency. An application study is carried out for both the simulated data set and real data set. Also for checking the suitability of these three models, we verified the group membership of the models, by adopting three classification techniques known as discriminant analysis, CART and random forest. We proposed an algorithm for selecting the better model among a set of models and computed the misclassification rates for a zero inflated count data set using different classifiers.

Keywords

artificial neural networks, classifiers, discriminant analysis, hurdle model, relative efficiency, standardized mean squared error, zero inflated Poisson model

Copyright

This work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

References

[1]	Tu, W., and Liu, H, Zero-inflated data, Wiley StatsRef: Statistics Reference Online, 2016.

[2]	Neyman, J, “On a new class of contagious distributions applicable in entomology and bacteriology,” Annals of Mathematical Statistics, 10(1), 35-57, 1939.

[3]	Feller, W, “On a general class of contagious distributions,” Annals of Mathematical Statistics, 14(4), 389-400, 1943.

[4]	Lambert, D, “Zero-inflated poisson regression with an application to defects in manufacturing,” Technometrics, 34(1), 1-17, 1992.

[5]	Yip, K.C.H., and Yau, K.K.W, “On modeling claim frequency data in general insurance with extra zeros,” Insurance: Mathematics and Economics, 36, 153-163, 2005.

[6]	Famoye, F., and Singh, K. P, “Zero-inflated generalized poisson model with an application to domestic violence data,” Journal of Data Science, 4 (1), 117-130, 2006.

[7]	Mullahy, J, “Specification and Testing of Some Modified Count Data Models,” Journal of Econometrics, 33(3), 341-365, 1986.

[8]	Heilbron, D, “Zero-altered and other regression models for count data with added zeros,” Biometrical Journal, 36(5), 531-547, 1994.

[9]	Yunos, Z.M., A.Ali, A., Shamsyuddin, S.M., Ismail, N., and Sallehuddin, R. S, “Predictive modelling for motor insurance claims using artificial neural networks”, International Journal of Advances in Soft Computing and its Applications, 8(3), 160-172, 2016.

[10]	Fisher, R.A, “The use of multiple measurements in taxonomy problems,” Annals of Eugenics, 7, 179-188, 1936.

[11]	Pal, M., and Mather, P.M, “An assessment of the effectiveness of decision tree methods for land cover classification,” Remote Sensing of Environment, 86, 554-565, 2003.

[12]	Tso, B., and Mather, P. M, Classification methods for remotely sensed data, CRC Press, Boca Raton, 2009, 56 and 69.

[13]	Breiman, L, “Random forests,” Machine Learning, 45, pp. 5-32, 2001.

[14]	Shima Haghani., Morteza Sedehi, and Soleiman Kheiri, “Artificial neural network to modeling zero-inflated count data: Application to predicting number of return to blood donation,” Journal of Research in Health Sciences, 17(3), 2017.

[15]	Young II, W. A., Holland, W. S., and Weckman, G. R, “Determining hall of fame staute for major league baseball using an artificial neural network,” Journal of Quantitative Analysis in Sports, 4(4), 1-44, 2008.

[16]	Breiman, L., Friedman J., Olshen, R., and Stone, C. Classification and regression trees, New York: Chapman & Hall, 1984.

[17]	Sakthivel, K.M., and Rajitha, C.S, “A Comparative Study of Zero-inflated, Hurdle Models with Artificial Neural Network in Claim Count Modeling”, International Journal of Statistics and Systems, 12(2), 265-276, 2017.

[18]	Sakthivel, K.M., and Rajitha, C.S, “A Comparative Study of Modeling on Claim Frequency in Non-life Insurance”, International Journal of Statistika and Mathematika, 24(1), 01-06, 2017.