Dr SEKA Dagou has completed his Ph.D. from North Dakota State University in 1993. After a period in private entrepreneurship filled with up-and-downs, he returned to academia in 2007 when he started to teach genetics and statistics.
Abstract
We used two classification methods, Gaussian Naive Bayes and Logistic Regression to predict the genotypes of the offspring of two maize strains, the BLC and the JNE genotypes based on the phenotypic traits of the parents. We determined the prediction performance of the two models with the overall accuracy and the area under the receiver operating curve. The overall accuracy for both models ranged between 72% and 82%. The values of the area under the receiver operating curve were 0.79 or higher for Logistic Regression models, and 0.75 or higher for Gaussian Naïve Bayes models. These statistics indicated that the two models were very effective in predicting the genotypes of the offspring. Furthermore, both models predicted the BLC genotype with higher accuracy than they did the JNE genotype. The BLC genotype appeared more homogeneous and more predictable. A Chi-square test for the homogeneity of the confusion matrices showed that in all cases the two models produced similar prediction results. That finding was in line with the assertion by Mitchell (2010) who theoretically showed that the two models are essentially the same. With logistic regression, each subset of the original data or its corresponding principal components produced exactly the same prediction results.