Dissecting Root, Shoot, and Water-Use Traits in Soybean Using Genomic Prediction and Explainable Machine Learning
DOI:
https://doi.org/10.5147/pggb.266Abstract
Improving drought resilience in soybean requires a deep understanding of the genetic basis underlying root architecture, shoot biomass, and water-use traits. Using the well-characterized Essex × Forrest recombinant inbred line (RIL) population (n = 94), we combined classical composite interval mapping (CIM) with interpretable machine learning (ML) models to dissect 15 traits related to early vigor and drought tolerance. Genomic prediction models—including Ridge Regression and XGBoost—were trained on 370 molecular markers. XGBoost achieved superior predictive accuracy (R² up to 0.72), especially for biomass-related traits. SHAP (SHapley Additive exPlanations) analysis provided interpretable insights into marker contributions, identifying both previously known QTL and novel loci with directional effects. Several high-importance markers aligned with QTL reported by Williams et al. (2012) and Salvador et al. (2012), supporting the biological validity of the ML-based approach. Traits such as relative water content (RWC), root fresh weight (RFW), and shoot dry weight (SDW) were effectively modeled, and markers on chromosomes 1, 8, 10, and 18 emerged as pleiotropic hotspots. This integrative framework showcases the power of explainable AI in plant genomics and offers a robust pipeline for future marker-assisted selection in soybean breeding.