Show simple item record

dc.contributor.authorPiles, Miriam
dc.contributor.authorBergsma, Rob
dc.contributor.authorGianola, Daniel
dc.contributor.authorGilbert, Hélène
dc.contributor.authorTusell, Llibertat
dc.contributor.otherProducció Animalca
dc.date.accessioned2021-03-02T08:28:18Z
dc.date.available2021-03-02T08:28:18Z
dc.date.issued2021-02-22
dc.identifier.citationPiles, Miriam, Rob Bergsma, Daniel Gianola, Hélène Gilbert, and Llibertat Tusell. 2021. "Feature Selection Stability And Accuracy Of Prediction Models For Genomic Prediction Of Residual Feed Intake In Pigs Using Machine Learning". Frontiers In Genetics 12. doi:10.3389/fgene.2021.611506.ca
dc.identifier.issn1664-8021ca
dc.identifier.urihttp://hdl.handle.net/20.500.12327/1165
dc.description.abstractFeature selection (FS, i.e., selection of a subset of predictor variables) is essential in high-dimensional datasets to prevent overfitting of prediction/classification models and reduce computation time and resources. In genomics, FS allows identifying relevant markers and designing low-density SNP chips to evaluate selection candidates. In this research, several univariate and multivariate FS algorithms combined with various parametric and non-parametric learners were applied to the prediction of feed efficiency in growing pigs from high-dimensional genomic data. The objective was to find the best combination of feature selector, SNP subset size, and learner leading to accurate and stable (i.e., less sensitive to changes in the training data) prediction models. Genomic best linear unbiased prediction (GBLUP) without SNP pre-selection was the benchmark. Three types of FS methods were implemented: (i) filter methods: univariate (univ.dtree, spearcor) or multivariate (cforest, mrmr), with random selection as benchmark; (ii) embedded methods: elastic net and least absolute shrinkage and selection operator (LASSO) regression; (iii) combination of filter and embedded methods. Ridge regression, support vector machine (SVM), and gradient boosting (GB) were applied after pre-selection performed with the filter methods. Data represented 5,708 individual records of residual feed intake to be predicted from the animal’s own genotype. Accuracy (stability of results) was measured as the median (interquartile range) of the Spearman correlation between observed and predicted data in a 10-fold cross-validation. The best prediction in terms of accuracy and stability was obtained with SVM and GB using 500 or more SNPs [0.28 (0.02) and 0.27 (0.04) for SVM and GB with 1,000 SNPs, respectively]. With larger subset sizes (1,000–1,500 SNPs), the filter method had no influence on prediction quality, which was similar to that attained with a random selection. With 50–250 SNPs, the FS method had a huge impact on prediction quality: it was very poor for tree-based methods combined with any learner, but good and similar to what was obtained with larger SNP subsets when spearcor or mrmr were implemented with or without embedded methods. Those filters also led to very stable results, suggesting their potential use for designing low-density SNP chips for genome-based evaluation of feed efficiency.ca
dc.format.extent14ca
dc.language.isoengca
dc.publisherFrontiers Mediaca
dc.relation.ispartofFrontiers in Geneticsca
dc.rightsAttribution 4.0 Internationalca
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.titleFeature Selection Stability and Accuracy of Prediction Models for Genomic Prediction of Residual Feed Intake in Pigs Using Machine Learningca
dc.typeinfo:eu-repo/semantics/articleca
dc.description.versioninfo:eu-repo/semantics/publishedVersionca
dc.rights.accessLevelinfo:eu-repo/semantics/openAccess
dc.embargo.termscapca
dc.relation.projectIDEC/H2020/633531/EU/Adapting the feed, the animal and the feeding techniques to improve the efficiency and sustainability of monogastric livestock production systems/Feed-a-Geneca
dc.relation.projectIDMICIU/Programa Estatal de I+D+I orientada a los retos de la sociedad/RTI2018-097610-R-I00/ES/MEJORA DE LA EFECTIVIDAD Y LA VIABILIDAD DE LOS PROGRAMAS DE SELECCION GENETICA PARA AUMENTAR LA EFICIENCIA ALIMENTARIA DE ESPECIES PROLIFICA/ca
dc.subject.udc619ca
dc.identifier.doihttps://doi.org/10.3389/fgene.2021.611506ca
dc.contributor.groupGenètica i Millora Animalca


Files in this item

 

This item appears in the following Collection(s)

Show simple item record

Attribution 4.0 International
Except where otherwise noted, this item's license is described as http://creativecommons.org/licenses/by/4.0/