Solving the missing value problem in PCA ...
Type de document :
Article dans une revue scientifique: Article original
URL permanente :
Titre :
Solving the missing value problem in PCA by Orthogonalized-Alternating Least Squares (O-ALS)
Auteur(s) :
Gomez Sanchez, Adrian [Auteur]
Laboratoire Avancé de Spectroscopie pour les Intéractions la Réactivité et l'Environnement (LASIRE) - UMR 8516
Vitale, Raffaele [Auteur]
Laboratoire Avancé de Spectroscopie pour les Intéractions la Réactivité et l'Environnement (LASIRE) - UMR 8516
Ruckebusch, Cyril [Auteur]
Laboratoire Avancé de Spectroscopie pour les Intéractions la Réactivité et l'Environnement (LASIRE) - UMR 8516
De Juan, Anna [Auteur]
Universitat de Barcelona [UB]
Laboratoire Avancé de Spectroscopie pour les Intéractions la Réactivité et l'Environnement (LASIRE) - UMR 8516
Vitale, Raffaele [Auteur]
Laboratoire Avancé de Spectroscopie pour les Intéractions la Réactivité et l'Environnement (LASIRE) - UMR 8516
Ruckebusch, Cyril [Auteur]

Laboratoire Avancé de Spectroscopie pour les Intéractions la Réactivité et l'Environnement (LASIRE) - UMR 8516
De Juan, Anna [Auteur]
Universitat de Barcelona [UB]
Titre de la revue :
Chemometrics Intell. Lab. Syst.
Nom court de la revue :
Chemometrics Intell. Lab. Syst.
Numéro :
250
Pagination :
-
Date de publication :
2024-11-18
ISSN :
0169-7439
Mot(s)-clé(s) en anglais :
Principal Component Analysis (PCA)
Missing values
Nonlinear Estimation by Iterative Partial Least Squares (NIPALS)
Imputation
Singular Value Decomposition (SVD)
Orthogonalized-Alternating Least Squares (O-ALS)
Missing values
Nonlinear Estimation by Iterative Partial Least Squares (NIPALS)
Imputation
Singular Value Decomposition (SVD)
Orthogonalized-Alternating Least Squares (O-ALS)
Discipline(s) HAL :
Chimie/Chimie théorique et/ou physique
Résumé en anglais : [en]
Dealing with missing data poses a challenge in Principal Component Analysis (PCA) since the most common algorithms are not designed to handle them. Several approaches have been proposed to solve the missing value problem ...
Lire la suite >Dealing with missing data poses a challenge in Principal Component Analysis (PCA) since the most common algorithms are not designed to handle them. Several approaches have been proposed to solve the missing value problem in PCA, such as Imputation based on SVD (I-SVD), where missing entries are filled by imputation and updated in every iteration until convergence of the PCA model, and the adaptation of the Nonlinear Iterative Partial Least Squares (NIPALS) algorithm, able to work skipping the missing entries during the least-squares estimation of scores and loadings. However, some limitations have been reported for both approaches. On the one hand, convergence of the I-SVD algorithm can be very slow for data sets with a high percentage of missing data. On the other hand, the orthogonality properties among scores and loadings might be lost when using NIPALS. To solve these issues and perform PCA of data sets with missing values without the need of imputation steps, a novel algorithm called Orthogonalized-Alternating Least Squares (O-ALS) is proposed. The O-ALS algorithm is an alternating least-squares algorithm that estimates the scores and loadings subject to the Gram-Schmidt orthogonalization constraint. The way to estimate scores and loadings is adapted to work only with the available information. In this study, the performance of O-ALS is tested and compared with NIPALS and I-SVD in simulated data sets and in a real case study. The results show that O-ALS is an accurate and fast algorithm to analyze data with any percentage and distribution pattern of missing entries, being able to provide correct scores and loadings in cases where I-SVD and NIPALS do not perform satisfactorily.Lire moins >
Lire la suite >Dealing with missing data poses a challenge in Principal Component Analysis (PCA) since the most common algorithms are not designed to handle them. Several approaches have been proposed to solve the missing value problem in PCA, such as Imputation based on SVD (I-SVD), where missing entries are filled by imputation and updated in every iteration until convergence of the PCA model, and the adaptation of the Nonlinear Iterative Partial Least Squares (NIPALS) algorithm, able to work skipping the missing entries during the least-squares estimation of scores and loadings. However, some limitations have been reported for both approaches. On the one hand, convergence of the I-SVD algorithm can be very slow for data sets with a high percentage of missing data. On the other hand, the orthogonality properties among scores and loadings might be lost when using NIPALS. To solve these issues and perform PCA of data sets with missing values without the need of imputation steps, a novel algorithm called Orthogonalized-Alternating Least Squares (O-ALS) is proposed. The O-ALS algorithm is an alternating least-squares algorithm that estimates the scores and loadings subject to the Gram-Schmidt orthogonalization constraint. The way to estimate scores and loadings is adapted to work only with the available information. In this study, the performance of O-ALS is tested and compared with NIPALS and I-SVD in simulated data sets and in a real case study. The results show that O-ALS is an accurate and fast algorithm to analyze data with any percentage and distribution pattern of missing entries, being able to provide correct scores and loadings in cases where I-SVD and NIPALS do not perform satisfactorily.Lire moins >
Langue :
Anglais
Audience :
Internationale
Vulgarisation :
Non
Établissement(s) :
Université de Lille
CNRS
CNRS
Collections :
Équipe(s) de recherche :
Dynamics, Nanoscopy & Chemometrics (DyNaChem)
Date de dépôt :
2024-11-21T22:03:42Z
2024-12-04T08:24:46Z
2024-12-04T08:24:46Z