Study on Acoustic Model Personalization ...
Type de document :
Communication dans un congrès avec actes
Titre :
Study on Acoustic Model Personalization in a Context of Collaborative Learning Constrained by Privacy Preservation
Auteur(s) :
Mdhaffar, Salima [Auteur]
Laboratoire Informatique d'Avignon [LIA]
Tommasi, Marc [Auteur]
Machine Learning in Information Networks [MAGNET]
Estève, Yannick [Auteur]
Laboratoire Informatique d'Avignon [LIA]
Laboratoire Informatique d'Avignon [LIA]
Tommasi, Marc [Auteur]

Machine Learning in Information Networks [MAGNET]
Estève, Yannick [Auteur]
Laboratoire Informatique d'Avignon [LIA]
Titre de la manifestation scientifique :
SPECOM 2021 - 23rd International Conference on Speech and Computer
Ville :
St Petersburg
Pays :
Russie
Date de début de la manifestation scientifique :
2021-09-27
Titre de l’ouvrage :
Speech and Computer 23rd International Conference, SPECOM 2021, St. Petersburg, Russia, September 27–30, 2021, Proceedings
Date de publication :
2021
Mot(s)-clé(s) en anglais :
Automatic speech recognition
Privacy-protection
Collaborative learning
Acoustic models
Personalization
Privacy-protection
Collaborative learning
Acoustic models
Personalization
Discipline(s) HAL :
Informatique [cs]/Informatique et langage [cs.CL]
Résumé en anglais : [en]
This paper investigates different approaches in order to improve the performance of a speech recognition system for a given speaker by using no more than 5 min of speech from this speaker, and without exchanging data from ...
Lire la suite >This paper investigates different approaches in order to improve the performance of a speech recognition system for a given speaker by using no more than 5 min of speech from this speaker, and without exchanging data from other users/speakers. Inspired by the federated learning paradigm, we consider speakers that have access to a personalized database of their own speech, learn an acoustic model and collaborate with other speakers in a network to improve their model. Several local personalizations are explored depending on how aggregation mechanisms are performed. We study the impact of selecting, in an adaptive way, a subset of speakers's models based on a notion of similarity. We also investigate the effect of weighted averaging of fine-tuned and global models. In our approach, only neural acoustic model parameters are exchanged and no audio data is exchanged. By avoiding communicating their personal data, the proposed approach tends to preserve the privacy of speakers. Experiments conducted on the TEDLIUM 3 dataset show that the best improvement is given by averaging a subset of different acoustic models fine-tuned on several user datasets. Our approach applied to HMM/TDNN acoustic models improves quickly and significantly the ASR performance in terms of WER (for instance in one of our two evaluation datasets, from 14.84% to 13.45% with less than 5 min of speech per speaker).Lire moins >
Lire la suite >This paper investigates different approaches in order to improve the performance of a speech recognition system for a given speaker by using no more than 5 min of speech from this speaker, and without exchanging data from other users/speakers. Inspired by the federated learning paradigm, we consider speakers that have access to a personalized database of their own speech, learn an acoustic model and collaborate with other speakers in a network to improve their model. Several local personalizations are explored depending on how aggregation mechanisms are performed. We study the impact of selecting, in an adaptive way, a subset of speakers's models based on a notion of similarity. We also investigate the effect of weighted averaging of fine-tuned and global models. In our approach, only neural acoustic model parameters are exchanged and no audio data is exchanged. By avoiding communicating their personal data, the proposed approach tends to preserve the privacy of speakers. Experiments conducted on the TEDLIUM 3 dataset show that the best improvement is given by averaging a subset of different acoustic models fine-tuned on several user datasets. Our approach applied to HMM/TDNN acoustic models improves quickly and significantly the ASR performance in terms of WER (for instance in one of our two evaluation datasets, from 14.84% to 13.45% with less than 5 min of speech per speaker).Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Projet ANR :
Collections :
Source :
Fichiers
- https://hal.archives-ouvertes.fr/hal-03369206/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-03369206/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-03369206/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- Personalization___SPECOM_21-4.pdf
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- Personalization___SPECOM_21-4.pdf
- Accès libre
- Accéder au document