Unsupervised Word Segmentation from Speech with Attention

Godard, Pierre; Zanon Boito, Marcely; Ondel, Lucas; Berard, Alexandre; Yvon, François; Villavicencio, Aline; Besacier, Laurent

Document type :

Communication dans un congrès avec actes

Title :

Unsupervised Word Segmentation from Speech with Attention

Author(s) :

Godard, Pierre [Auteur]
Traitement du Langage Parlé [TLP]
Zanon Boito, Marcely [Auteur]
Laboratoire d'Informatique de Grenoble [LIG ]
Ondel, Lucas [Auteur]
Brno University of Technology [Brno] [BUT]
Berard, Alexandre [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Yvon, François [Auteur]
Traitement du Langage Parlé [TLP]
Villavicencio, Aline [Auteur]
School of Computer Science and Electronic Engineering [Essex] [CSEE]
Besacier, Laurent [Auteur]
Institut universitaire de France [IUF]

Conference title :

Interspeech 2018

City :

Hyderabad

Country :

Inde

Start date of the conference :

2018-09

HAL domain(s) :

Informatique [cs]/Informatique et langage [cs.CL]

English abstract : [en]

We present a first attempt to perform attentional word segmen-tation directly from the speech signal, with the final goal to automatically identify lexical units in a low-resource, unwritten language (UL). Our methodology ...
Show more >We present a first attempt to perform attentional word segmen-tation directly from the speech signal, with the final goal to automatically identify lexical units in a low-resource, unwritten language (UL). Our methodology assumes a pairing between recordings in the UL with translations in a well-resourced language. It uses Acoustic Unit Discovery (AUD) to convert speech into a sequence of pseudo-phones that is segmented using neural soft-alignments produced by a neural machine translation model. Evaluation uses an actual Bantu UL, Mboshi; comparisons to monolingual and bilingual baselines illustrate the potential of attentional word segmentation for language documentation.Show less >

Language :

Anglais

Peer reviewed article :

Oui

Audience :

Internationale

Popular science :

Non

ANR Project :

Breaking the Unwritten Language Barrier
Systemes et Algorithmes Pervasifs au confluent des mondes physique et numérique

Collections :