Solving the missing value problem in PCA ...
Document type :
Article dans une revue scientifique: Article original
Permalink :
Title :
Solving the missing value problem in PCA by Orthogonalized-Alternating Least Squares (O-ALS)
Author(s) :
Gomez Sanchez, Adrian [Auteur]
Laboratoire Avancé de Spectroscopie pour les Intéractions la Réactivité et l'Environnement (LASIRE) - UMR 8516
Vitale, Raffaele [Auteur]
Laboratoire Avancé de Spectroscopie pour les Intéractions la Réactivité et l'Environnement (LASIRE) - UMR 8516
Ruckebusch, Cyril [Auteur]
Laboratoire Avancé de Spectroscopie pour les Intéractions la Réactivité et l'Environnement (LASIRE) - UMR 8516
De Juan, Anna [Auteur]
Universitat de Barcelona [UB]
Laboratoire Avancé de Spectroscopie pour les Intéractions la Réactivité et l'Environnement (LASIRE) - UMR 8516
Vitale, Raffaele [Auteur]
Laboratoire Avancé de Spectroscopie pour les Intéractions la Réactivité et l'Environnement (LASIRE) - UMR 8516
Ruckebusch, Cyril [Auteur]

Laboratoire Avancé de Spectroscopie pour les Intéractions la Réactivité et l'Environnement (LASIRE) - UMR 8516
De Juan, Anna [Auteur]
Universitat de Barcelona [UB]
Journal title :
Chemometrics Intell. Lab. Syst.
Abbreviated title :
Chemometrics Intell. Lab. Syst.
Volume number :
250
Pages :
-
Publication date :
2024-11-18
ISSN :
0169-7439
English keyword(s) :
Principal Component Analysis (PCA)
Missing values
Nonlinear Estimation by Iterative Partial Least Squares (NIPALS)
Imputation
Singular Value Decomposition (SVD)
Orthogonalized-Alternating Least Squares (O-ALS)
Missing values
Nonlinear Estimation by Iterative Partial Least Squares (NIPALS)
Imputation
Singular Value Decomposition (SVD)
Orthogonalized-Alternating Least Squares (O-ALS)
HAL domain(s) :
Chimie/Chimie théorique et/ou physique
English abstract : [en]
Dealing with missing data poses a challenge in Principal Component Analysis (PCA) since the most common algorithms are not designed to handle them. Several approaches have been proposed to solve the missing value problem ...
Show more >Dealing with missing data poses a challenge in Principal Component Analysis (PCA) since the most common algorithms are not designed to handle them. Several approaches have been proposed to solve the missing value problem in PCA, such as Imputation based on SVD (I-SVD), where missing entries are filled by imputation and updated in every iteration until convergence of the PCA model, and the adaptation of the Nonlinear Iterative Partial Least Squares (NIPALS) algorithm, able to work skipping the missing entries during the least-squares estimation of scores and loadings. However, some limitations have been reported for both approaches. On the one hand, convergence of the I-SVD algorithm can be very slow for data sets with a high percentage of missing data. On the other hand, the orthogonality properties among scores and loadings might be lost when using NIPALS. To solve these issues and perform PCA of data sets with missing values without the need of imputation steps, a novel algorithm called Orthogonalized-Alternating Least Squares (O-ALS) is proposed. The O-ALS algorithm is an alternating least-squares algorithm that estimates the scores and loadings subject to the Gram-Schmidt orthogonalization constraint. The way to estimate scores and loadings is adapted to work only with the available information. In this study, the performance of O-ALS is tested and compared with NIPALS and I-SVD in simulated data sets and in a real case study. The results show that O-ALS is an accurate and fast algorithm to analyze data with any percentage and distribution pattern of missing entries, being able to provide correct scores and loadings in cases where I-SVD and NIPALS do not perform satisfactorily.Show less >
Show more >Dealing with missing data poses a challenge in Principal Component Analysis (PCA) since the most common algorithms are not designed to handle them. Several approaches have been proposed to solve the missing value problem in PCA, such as Imputation based on SVD (I-SVD), where missing entries are filled by imputation and updated in every iteration until convergence of the PCA model, and the adaptation of the Nonlinear Iterative Partial Least Squares (NIPALS) algorithm, able to work skipping the missing entries during the least-squares estimation of scores and loadings. However, some limitations have been reported for both approaches. On the one hand, convergence of the I-SVD algorithm can be very slow for data sets with a high percentage of missing data. On the other hand, the orthogonality properties among scores and loadings might be lost when using NIPALS. To solve these issues and perform PCA of data sets with missing values without the need of imputation steps, a novel algorithm called Orthogonalized-Alternating Least Squares (O-ALS) is proposed. The O-ALS algorithm is an alternating least-squares algorithm that estimates the scores and loadings subject to the Gram-Schmidt orthogonalization constraint. The way to estimate scores and loadings is adapted to work only with the available information. In this study, the performance of O-ALS is tested and compared with NIPALS and I-SVD in simulated data sets and in a real case study. The results show that O-ALS is an accurate and fast algorithm to analyze data with any percentage and distribution pattern of missing entries, being able to provide correct scores and loadings in cases where I-SVD and NIPALS do not perform satisfactorily.Show less >
Language :
Anglais
Audience :
Internationale
Popular science :
Non
Administrative institution(s) :
Université de Lille
CNRS
CNRS
Collections :
Research team(s) :
Dynamics, Nanoscopy & Chemometrics (DyNaChem)
Submission date :
2024-11-21T22:03:42Z
2024-12-04T08:24:46Z
2024-12-04T08:24:46Z