Real-Time Monophonic and Polyphonic Audio ...
Type de document :
Article dans une revue scientifique
URL permanente :
Titre :
Real-Time Monophonic and Polyphonic Audio Classification from Power Spectra
Auteur(s) :
Baelde, Maxime [Auteur]
Laboratoire Paul Painlevé - UMR 8524
Laboratoire Paul Painlevé - UMR 8524 [LPP]
Biernacki, Christophe [Auteur]
Greff, Raphaël [Auteur]
Laboratoire Paul Painlevé - UMR 8524
Laboratoire Paul Painlevé - UMR 8524 [LPP]
Biernacki, Christophe [Auteur]
Greff, Raphaël [Auteur]
Titre de la revue :
Pattern Recognition
Numéro :
92
Pagination :
82-92
Éditeur :
Elsevier
Date de publication :
2019-08-01
ISSN :
0031-3203
Mot(s)-clé(s) :
Real-time
Audio classification
Generative model
Polyphonic
Nonparametric estimation
Monophonic
Machine learning
Audio classification
Generative model
Polyphonic
Nonparametric estimation
Monophonic
Machine learning
Discipline(s) HAL :
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
This work addresses the recurring challenge of real-time monophonic and polyphonic audio source classification. The whole normalized power spectrum (NPS) is directly involved in the proposed process, avoiding complex and ...
Lire la suite >This work addresses the recurring challenge of real-time monophonic and polyphonic audio source classification. The whole normalized power spectrum (NPS) is directly involved in the proposed process, avoiding complex and hazardous traditional feature extraction. It is also a natural candidate for polyphonic events thanks to its additive property in such cases. The classification task is performed through a nonparametric kernel-based generative modeling of the power spectrum. Advantage of this model is twofold: it is almost hypothesis free and it allows to straightforwardly obtain the maximum a posteriori classification rule of online signals. Moreover it makes use of the monophonic dataset to build the polyphonic one. Then, to reach the real-time target, the complexity of the method can be tuned by using a standard hierarchical clustering preprocessing of the prototypes, revealing a particularly efficient computation time and classification accuracy trade-off. The proposed method, called RARE (for Real-time Audio Recognition Engine) reveals encouraging results both in monophonic and polyphonic classification tasks on benchmark and owned datasets, including also the targeted real-time situation. In particular, this method benefits from several advantages compared to the state-of-the-art methods including a reduced training time, no feature extraction, the ability to control the computation - accuracy trade-off and no training on already mixed sounds for polyphonic classification.Lire moins >
Lire la suite >This work addresses the recurring challenge of real-time monophonic and polyphonic audio source classification. The whole normalized power spectrum (NPS) is directly involved in the proposed process, avoiding complex and hazardous traditional feature extraction. It is also a natural candidate for polyphonic events thanks to its additive property in such cases. The classification task is performed through a nonparametric kernel-based generative modeling of the power spectrum. Advantage of this model is twofold: it is almost hypothesis free and it allows to straightforwardly obtain the maximum a posteriori classification rule of online signals. Moreover it makes use of the monophonic dataset to build the polyphonic one. Then, to reach the real-time target, the complexity of the method can be tuned by using a standard hierarchical clustering preprocessing of the prototypes, revealing a particularly efficient computation time and classification accuracy trade-off. The proposed method, called RARE (for Real-time Audio Recognition Engine) reveals encouraging results both in monophonic and polyphonic classification tasks on benchmark and owned datasets, including also the targeted real-time situation. In particular, this method benefits from several advantages compared to the state-of-the-art methods including a reduced training time, no feature extraction, the ability to control the computation - accuracy trade-off and no training on already mixed sounds for polyphonic classification.Lire moins >
Langue :
Anglais
Audience :
Internationale
Vulgarisation :
Non
Établissement(s) :
CNRS
Université de Lille
Université de Lille
Date de dépôt :
2020-06-08T14:10:25Z
2020-06-09T08:54:08Z
2020-06-09T08:54:08Z
Fichiers
- documen
- Accès libre
- Accéder au document