New efficient algorithms for multiple ...
Type de document :
Pré-publication ou Document de travail
Titre :
New efficient algorithms for multiple change-point detection with kernels
Auteur(s) :
Celisse, Alain [Auteur]
Université Lille Nord (France)
Laboratoire Paul Painlevé - UMR 8524 [LPP]
MOdel for Data Analysis and Learning [MODAL]
Briend, Guillemette [Auteur]
MOdel for Data Analysis and Learning [MODAL]
Evaluation des technologies de santé et des pratiques médicales - ULR 2694 [METRICS]
Pierre-Jean, Morgane [Auteur]
Laboratoire de Mathématiques et Modélisation d'Evry [LaMME]
Rigaill, Guillem [Auteur]
Institut des Sciences des Plantes de Paris-Saclay [IPS2 (UMR_9213 / UMR_1403)]
Université Lille Nord (France)
Laboratoire Paul Painlevé - UMR 8524 [LPP]
MOdel for Data Analysis and Learning [MODAL]
Briend, Guillemette [Auteur]
MOdel for Data Analysis and Learning [MODAL]
Evaluation des technologies de santé et des pratiques médicales - ULR 2694 [METRICS]
Pierre-Jean, Morgane [Auteur]
Laboratoire de Mathématiques et Modélisation d'Evry [LaMME]
Rigaill, Guillem [Auteur]
Institut des Sciences des Plantes de Paris-Saclay [IPS2 (UMR_9213 / UMR_1403)]
Mot(s)-clé(s) en anglais :
dynamic programming
Gram matrix
Kernel method
nonparametric change-point detection
algorithms
model selection
DNA copy number
allele B fraction
Gram matrix
Kernel method
nonparametric change-point detection
algorithms
model selection
DNA copy number
allele B fraction
Discipline(s) HAL :
Mathématiques [math]/Statistiques [math.ST]
Statistiques [stat]/Calcul [stat.CO]
Statistiques [stat]/Machine Learning [stat.ML]
Statistiques [stat]/Calcul [stat.CO]
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
Several statistical approaches based on reproducing kernels have been proposed to detect abrupt changes arising in the full distribution of the observations and not only in the mean or variance. Some of these approaches ...
Lire la suite >Several statistical approaches based on reproducing kernels have been proposed to detect abrupt changes arising in the full distribution of the observations and not only in the mean or variance. Some of these approaches enjoy good statistical properties (oracle inequality, \ldots). Nonetheless, they have a high computational cost both in terms of time and memory. This makes their application difficult even for small and medium sample sizes ($n< 10^4$). This computational issue is addressed by first describing a new efficient and exact algorithm for kernel multiple change-point detection with an improved worst-case complexity that is quadratic in time and linear in space. It allows dealing with medium size signals (up to $n \approx 10^5$). Second, a faster but approximation algorithm is described. It is based on a low-rank approximation to the Gram matrix. It is linear in time and space. This approximation algorithm can be applied to large-scale signals ($n \geq 10^6$). These exact and approximation algorithms have been implemented in \texttt{R} and \texttt{C} for various kernels. The computational and statistical performances of these new algorithms have been assessed through empirical experiments. The runtime of the new algorithms is observed to be faster than that of other considered procedures. Finally, simulations confirmed the higher statistical accuracy of kernel-based approaches to detect changes that are not only in the mean. These simulations also illustrate the flexibility of kernel-based approaches to analyze complex biological profiles made of DNA copy number and allele B frequencies. An R package implementing the approach will be made available on github.Lire moins >
Lire la suite >Several statistical approaches based on reproducing kernels have been proposed to detect abrupt changes arising in the full distribution of the observations and not only in the mean or variance. Some of these approaches enjoy good statistical properties (oracle inequality, \ldots). Nonetheless, they have a high computational cost both in terms of time and memory. This makes their application difficult even for small and medium sample sizes ($n< 10^4$). This computational issue is addressed by first describing a new efficient and exact algorithm for kernel multiple change-point detection with an improved worst-case complexity that is quadratic in time and linear in space. It allows dealing with medium size signals (up to $n \approx 10^5$). Second, a faster but approximation algorithm is described. It is based on a low-rank approximation to the Gram matrix. It is linear in time and space. This approximation algorithm can be applied to large-scale signals ($n \geq 10^6$). These exact and approximation algorithms have been implemented in \texttt{R} and \texttt{C} for various kernels. The computational and statistical performances of these new algorithms have been assessed through empirical experiments. The runtime of the new algorithms is observed to be faster than that of other considered procedures. Finally, simulations confirmed the higher statistical accuracy of kernel-based approaches to detect changes that are not only in the mean. These simulations also illustrate the flexibility of kernel-based approaches to analyze complex biological profiles made of DNA copy number and allele B frequencies. An R package implementing the approach will be made available on github.Lire moins >
Langue :
Anglais
Projet ANR :
Collections :
Source :
Fichiers
- document
- Accès libre
- Accéder au document
- article.pdf
- Accès libre
- Accéder au document
- 1710.04556
- Accès libre
- Accéder au document