Affiliations 

  • 1 Division of Plant Physiology, Indian Council of Agricultural Research-Indian Agricultural Research Institute (ICAR-IARI), New Delhi, India
  • 2 Division of Genetics, Indian Council of Agricultural Research-Indian Agricultural Research Institute (ICAR-IARI), New Delhi, India
  • 3 Division of Agricultural Physics, Indian Council of Agricultural Research-Indian Agricultural Research Institute (ICAR-IARI), New Delhi, India
Front Plant Sci, 2023;14:1206357.
PMID: 37771485 DOI: 10.3389/fpls.2023.1206357

Abstract

Among seed attributes, weight is one of the main factors determining the soybean harvest index. Recently, the focus of soybean breeding has shifted to improving seed size and weight for crop optimization in terms of seed and oil yield. With recent technological advancements, there is an increasing application of imaging sensors that provide simple, real-time, non-destructive, and inexpensive image data for rapid image-based prediction of seed traits in plant breeding programs. The present work is related to digital image analysis of seed traits for the prediction of hundred-seed weight (HSW) in soybean. The image-based seed architectural traits (i-traits) measured were area size (AS), perimeter length (PL), length (L), width (W), length-to-width ratio (LWR), intersection of length and width (IS), seed circularity (CS), and distance between IS and CG (DS). The phenotypic investigation revealed significant genetic variability among 164 soybean genotypes for both i-traits and manually measured seed weight. Seven popular machine learning (ML) algorithms, namely Simple Linear Regression (SLR), Multiple Linear Regression (MLR), Random Forest (RF), Support Vector Regression (SVR), LASSO Regression (LR), Ridge Regression (RR), and Elastic Net Regression (EN), were used to create models that can predict the weight of soybean seeds based on the image-based novel features derived from the Red-Green-Blue (RGB)/visual image. Among the models, random forest and multiple linear regression models that use multiple explanatory variables related to seed size traits (AS, L, W, and DS) were identified as the best models for predicting seed weight with the highest prediction accuracy (coefficient of determination, R2=0.98 and 0.94, respectively) and the lowest prediction error, i.e., root mean square error (RMSE) and mean absolute error (MAE). Finally, principal components analysis (PCA) and a hierarchical clustering approach were used to identify IC538070 as a superior genotype with a larger seed size and weight. The identified donors/traits can potentially be used in soybean improvement programs.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.