Real-Time Monophonic and Polyphonic Audio Classification from Power Spectra

Baelde, Maxime; Biernacki, Christophe; Greff, Raphaël

Type de document :

Article dans une revue scientifique

DOI :

10.1016/j.patcog.2019.03.017

URL permanente :

http://hdl.handle.net/20.500.12210/29209

Titre :

Real-Time Monophonic and Polyphonic Audio Classification from Power Spectra

Auteur(s) :

Baelde, Maxime [Auteur]
Laboratoire Paul Painlevé - UMR 8524
Laboratoire Paul Painlevé - UMR 8524 [LPP]
Biernacki, Christophe [Auteur]

Greff, Raphaël [Auteur]

Titre de la revue :

Pattern Recognition

Numéro :

Pagination :

82-92

Éditeur :

Elsevier

Date de publication :

2019-08-01

ISSN :

0031-3203

Mot(s)-clé(s) :

Real-time
Audio classification
Generative model
Polyphonic
Nonparametric estimation
Monophonic
Machine learning

Discipline(s) HAL :

Statistiques [stat]/Machine Learning [stat.ML]

Résumé en anglais : [en]

This work addresses the recurring challenge of real-time monophonic and polyphonic audio source classification. The whole normalized power spectrum (NPS) is directly involved in the proposed process, avoiding complex and ...
Lire la suite >This work addresses the recurring challenge of real-time monophonic and polyphonic audio source classification. The whole normalized power spectrum (NPS) is directly involved in the proposed process, avoiding complex and hazardous traditional feature extraction. It is also a natural candidate for polyphonic events thanks to its additive property in such cases. The classification task is performed through a nonparametric kernel-based generative modeling of the power spectrum. Advantage of this model is twofold: it is almost hypothesis free and it allows to straightforwardly obtain the maximum a posteriori classification rule of online signals. Moreover it makes use of the monophonic dataset to build the polyphonic one. Then, to reach the real-time target, the complexity of the method can be tuned by using a standard hierarchical clustering preprocessing of the prototypes, revealing a particularly efficient computation time and classification accuracy trade-off. The proposed method, called RARE (for Real-time Audio Recognition Engine) reveals encouraging results both in monophonic and polyphonic classification tasks on benchmark and owned datasets, including also the targeted real-time situation. In particular, this method benefits from several advantages compared to the state-of-the-art methods including a reduced training time, no feature extraction, the ability to control the computation - accuracy trade-off and no training on already mixed sounds for polyphonic classification.Lire moins >

Langue :

Anglais

Audience :

Internationale

Vulgarisation :

Non

Établissement(s) :

CNRS
Université de Lille

Collections :

METRICS : Evaluation des technologies de santé et des pratiques médicales - ULR 2694

Date de dépôt :

2020-06-08T14:10:25Z
2020-06-09T08:54:08Z

Fichiers

documen
Accès libre
Accéder au document

Numéro de version	Lien	Date de modification
2	20.500.12210/29209*	2020-06-09T08:53:51Z
1	20.500.12210/29209.1	2020-06-08T14:10:25Z

Real-Time Monophonic and Polyphonic Audio ...

BibTeX

CSV

Excel

RIS

Fichiers

Historique des modifications

Real-Time Monophonic and Polyphonic Audio ... BibTeX CSV Excel RIS

Fichiers

Historique des modifications

Real-Time Monophonic and Polyphonic Audio ...

BibTeX

CSV

Excel

RIS