- 标题
- 摘要
- 关键词
- 实验方案
- 产品
-
A variable importance criterion for variable selection in near-infrared spectral analysis
摘要: Variable selection is a universal problem in building multivariate calibration models, such as quantitative structure-activity relationship (QSAR) and quantitative relationships between quantity or property and spectral data. Significant improvement in the prediction ability of the models can be achieved by reducing the bias induced by the uninformative variables. A new criterion, named as C, is proposed in this study to evaluate the importance of the variables in a model. The value of C is defined as the average contribution of a variable to the model, which is calculated by the statistics of the models built with different combinations of the variables. In the calculation, a large number of partial least squares (PLS) models are built using a subset of variables selected by randomly re-sampling. Then, a vector of the prediction errors, in terms of root mean squared error of cross validation (RMSECV), and a matrix composed of 1 and 0 indicating the selected and unselected variables can be obtained. If multiple linear regression (MLR) is employed to model the relationship between the RMSECVs and the matrix, the coefficients of the MLR model can be used as a criterion to evaluate the contribution of a variable to the RMSECV. To enhance the efficiency of the method, a multi-step shrinkage strategy was used. Comparison with Monte Carlo-uninformative variables elimination (MC-UVE), randomization test (RT) and competitive adaptive reweighted sampling (CARS) was conducted using three NIR benchmark datasets. The results show that the proposed criterion is effective for selecting the informative variables from the spectra to improve the prediction ability of models.
关键词: multivariate calibration,multi-step strategy,variable selection,near-infrared spectroscopy
更新于2025-09-23 15:23:52
-
Use of A Portable Camera for Proximal Soil Sensing with Hyperspectral Image Data
摘要: In soil proximal sensing with visible and near-infrared spectroscopy, the currently available hyperspectral snapshot camera technique allows a rapid image data acquisition in a portable mode. This study describes how readings of a hyperspectral camera in the 450–950 nm region could be utilised for estimating soil parameters, which were soil organic carbon (OC), hot-water extractable-C, total nitrogen and clay content; readings were performed in the lab for raw samples without any crushing. As multivariate methods, we used PLSR with full spectra (FS) and also combined with two conceptually different methods of spectral variable selection (CARS, “competitive adaptive reweighted sampling” and IRIV, “iteratively retaining informative variables”). For the accuracy of obtained estimates, it was beneficial to use segmented images instead of image mean spectra, for which we applied a regular decomposing in sub-images all of the same size and k-means clustering. Based on FS-PLSR with image mean spectra, obtained estimates were not useful with RPD values less than 1.50 and R2 values being 0.51 in the best case. With segmented images, improvements were marked for all soil properties; RPD reached values ≥ 1.68 and R2 ≥ 0.66. For all image data and variables, IRIV-PLSR slightly outperformed CARS-PLSR.
关键词: spectral variable selection,hyperspectral snapshot camera,partial least squares regression,multivariate calibration,hyperspectral imaging,proximal soil sensing
更新于2025-09-23 15:22:29
-
Calibration of near infrared spectroscopy (NIRS) data of three Eucalyptus species with extractive contents determined by ASE extraction for rapid identification of species and high extractive contents
摘要: Plantations of naturally durable timber species could substitute unsustainably harvested wood from tropical forests or wood treated with toxic preservatives. The New Zealand Dryland Forests Initiative (NZDFI) has established a tree-breeding program to develop genetically improved planting stock for durable eucalyptus plantations. In this study the durable heartwood of Eucalyptus bosistoana, Eucalyptus globoidea and Eucalyptus argophloia was characterized by near infrared (NIR) spectroscopy and NIR data was calibrated with the extractives content (EC), determined by accelerated solvent extraction (ASE) extraction, by means of a partial least squares regression (PLSR) model. It was possible to predict the EC content in the range of 0.34–18.9% with a residual mean square error (RMSE) of 0.9%. Moreover, the three species could also be differentiated by NIR spectroscopy with 100% accuracy, i.e. NIR spectroscopy is able to segregate timbers from mixed species forest plantations.
关键词: variable selection (sMC),Eucalyptus argophloia,E. bosistoana,partial least squares regression (PLSR),E. globoidea,PLS-discriminant analysis (PLS-DA)
更新于2025-09-19 17:15:36
-
An improved weighted multiplicative scatter correction algorithm with the use of variable selection: Application to near-infrared spectra
摘要: Multiplicative light scattering has posed great challenge in near-infrared (NIR) quantitative analysis. When estimating the scattering parameters, uninformative variables for scattering effects may bias the estimation. Weighted least squares (WLS) can be used to avoid the influence of the uninformative variables. In this work, we proposed an improved weighted multiplicative scatter correction algorithm with the use of variable selection (WMSCVS). Baseline is removed first and then variable selection is used to obtain the optimal weights of WLS in estimating multiplicative parameters. The variable selection algorithm, which is designed based on model population analysis (MPA), implements an iterative optimization process. In each iteration, weighted bootstrap sampling (WBS) is used to generate variable subsets and exponentially decreasing function (EDF) is used to control the number of sampled variables. The interpretability and stability of the variable selection results as well as the predictive performance of the corrected spectra were investigated by using two NIR datasets. The experimental results showed that the proposed WMSCVS could give better predictive performance than the state-of-art correction methods.
关键词: Model population analysis,Weighted least squares,Multiplicative scatter correction,Variable selection
更新于2025-09-19 17:15:36
-
Potato hierarchical clustering and doneness degree determination by near-infrared (NIR) and attenuated total reflectance mid-infrared (ATR-MIR) spectroscopy
摘要: Near-infrared (NIR) and attenuated total reflectance mid-infrared (ATR-MIR) spectroscopy were used to identify potato varieties and detect potato doneness degree. The varieties of potato tubers can be successfully classified by hierarchical cluster analysis (HCA). The partial least squares regression (PLSR) model exhibited good prediction result for the doneness degree evaluation. Principal component and first-derivative iteration algorithm (PCFIA) was introduced to select feature variables instead of using the full wavelength spectra for modelling. Based on two sets of feature variables selected from NIR and MIR regions, both NIR–PCFIA–HCA and MIR–PCFIA–HCA showed higher performances of hierarchical clustering. Moreover, NIR–PCFIA–PLSR and MIR–PCFIA–PLSR models were effectively used to predict tuber doneness degree, yielding the RP as high as 0.935 and the RMSEP as low as of 0.503. It is concluded that the PCFIA is an effective approach for feature variable selection, and both NIR and MIR spectroscopic techniques are capable of classifying potato varieties and determining potato doneness degree.
关键词: HCA,ATR-MIR,Potato,Variable selection,PLSR,NIR
更新于2025-09-19 17:15:36
-
Estimation of forest structural and compositional variables using ALS data and multi-seasonal satellite imagery
摘要: Advanced forest resource inventory (FRI) information is of critical importance for sustainable forest management. FRIs are dependent on remote sensing data and processing methods, along with field calibration/validation to generate cost-effective options for modelling forest inventory and biophysical variables over large areas. The objective of this study was to examine the impact of combining multi-seasonal multispectral satellite imagery with airborne laser scanning (ALS) data for estimating basal area, species mixture and stem density for an uneven-aged tolerant hardwood forest in Ontario, Canada. Using random forest (RF) regression as a non-parametric diagnostic technique, three multispectral optical sensors (i.e., Landsat-5 TM, Sentinel-2 A and WorldView-2) were compared to examine the most cost-effective sensor configuration for modelling FRI variables. The contribution of spectral predictors derived from these optical sensors as well as ALS height and intensity metrics were evaluated using RF variable importance. As part of our variable selection framework, all predictor variables were grouped into relatively independent clusters using a hierarchical variable clustering technique, which revealed the distinctiveness between information contained in spectral predictors, height- and intensity-based metrics. This indicates that ALS intensity data carry unique information complementary to passive near-infrared data for forest characterization. ALS data alone did not result in accurate models for basal area and species mixture, but predictive accuracies were improved significantly with the addition of spectral predictors. Compared to single-date images, multi-seasonal imagery proved to be more accurate for modelling FRI variables, especially when combined with ALS data. Despite its limited spatial resolution, Sentinel-2 A was found to be the most cost-effective image source for enhancing ALS-based FRI models. Using variables identified by the variable selection procedure, best subsets regression outperformed the RF models developed for diagnostic analysis, resulting in a suite of accurate and parsimonious predictive models, with coefficients of determination of 0.73, 0.90 and 0.67, for basal area, species mixture, and stem density, respectively.
关键词: Multi-seasonal satellite imagery,Variable selection,Sentinel-2A,Airborne laser scanning (ALS),Forest resource inventory
更新于2025-09-10 09:29:36