Study on Acoustic Model Personalization ...
Document type :
Communication dans un congrès avec actes
Title :
Study on Acoustic Model Personalization in a Context of Collaborative Learning Constrained by Privacy Preservation
Author(s) :
Mdhaffar, Salima [Auteur]
Laboratoire Informatique d'Avignon [LIA]
Tommasi, Marc [Auteur]
Machine Learning in Information Networks [MAGNET]
Estève, Yannick [Auteur]
Laboratoire Informatique d'Avignon [LIA]
Laboratoire Informatique d'Avignon [LIA]
Tommasi, Marc [Auteur]

Machine Learning in Information Networks [MAGNET]
Estève, Yannick [Auteur]
Laboratoire Informatique d'Avignon [LIA]
Conference title :
SPECOM 2021 - 23rd International Conference on Speech and Computer
City :
St Petersburg
Country :
Russie
Start date of the conference :
2021-09-27
Book title :
Speech and Computer 23rd International Conference, SPECOM 2021, St. Petersburg, Russia, September 27–30, 2021, Proceedings
Publication date :
2021
English keyword(s) :
Automatic speech recognition
Privacy-protection
Collaborative learning
Acoustic models
Personalization
Privacy-protection
Collaborative learning
Acoustic models
Personalization
HAL domain(s) :
Informatique [cs]/Informatique et langage [cs.CL]
English abstract : [en]
This paper investigates different approaches in order to improve the performance of a speech recognition system for a given speaker by using no more than 5 min of speech from this speaker, and without exchanging data from ...
Show more >This paper investigates different approaches in order to improve the performance of a speech recognition system for a given speaker by using no more than 5 min of speech from this speaker, and without exchanging data from other users/speakers. Inspired by the federated learning paradigm, we consider speakers that have access to a personalized database of their own speech, learn an acoustic model and collaborate with other speakers in a network to improve their model. Several local personalizations are explored depending on how aggregation mechanisms are performed. We study the impact of selecting, in an adaptive way, a subset of speakers's models based on a notion of similarity. We also investigate the effect of weighted averaging of fine-tuned and global models. In our approach, only neural acoustic model parameters are exchanged and no audio data is exchanged. By avoiding communicating their personal data, the proposed approach tends to preserve the privacy of speakers. Experiments conducted on the TEDLIUM 3 dataset show that the best improvement is given by averaging a subset of different acoustic models fine-tuned on several user datasets. Our approach applied to HMM/TDNN acoustic models improves quickly and significantly the ASR performance in terms of WER (for instance in one of our two evaluation datasets, from 14.84% to 13.45% with less than 5 min of speech per speaker).Show less >
Show more >This paper investigates different approaches in order to improve the performance of a speech recognition system for a given speaker by using no more than 5 min of speech from this speaker, and without exchanging data from other users/speakers. Inspired by the federated learning paradigm, we consider speakers that have access to a personalized database of their own speech, learn an acoustic model and collaborate with other speakers in a network to improve their model. Several local personalizations are explored depending on how aggregation mechanisms are performed. We study the impact of selecting, in an adaptive way, a subset of speakers's models based on a notion of similarity. We also investigate the effect of weighted averaging of fine-tuned and global models. In our approach, only neural acoustic model parameters are exchanged and no audio data is exchanged. By avoiding communicating their personal data, the proposed approach tends to preserve the privacy of speakers. Experiments conducted on the TEDLIUM 3 dataset show that the best improvement is given by averaging a subset of different acoustic models fine-tuned on several user datasets. Our approach applied to HMM/TDNN acoustic models improves quickly and significantly the ASR performance in terms of WER (for instance in one of our two evaluation datasets, from 14.84% to 13.45% with less than 5 min of speech per speaker).Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
ANR Project :
Collections :
Source :
Files
- https://hal.archives-ouvertes.fr/hal-03369206/document
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-03369206/document
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-03369206/document
- Open access
- Access the document
- document
- Open access
- Access the document
- Personalization___SPECOM_21-4.pdf
- Open access
- Access the document