A French Corpus for Semantic Similarity
Type de document :
Autre communication scientifique (congrès sans actes - poster - séminaire...): Communication dans un congrès avec actes
Titre :
A French Corpus for Semantic Similarity
Auteur(s) :
Cardon, Rémi [Auteur]
Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Grabar, Natalia [Auteur]
Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Grabar, Natalia [Auteur]
Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Titre de la manifestation scientifique :
LREC 12th Edition of its Language Resources and Evaluation Conference.
Ville :
Marseille
Pays :
France
Date de début de la manifestation scientifique :
2020-05-11
Mot(s)-clé(s) en anglais :
semantic similarity
manual annotation
French language
regression
manual annotation
French language
regression
Discipline(s) HAL :
Informatique [cs]
Résumé en anglais : [en]
Semantic similarity is an area of Natural Language Processing that is useful for several downstream applications, such as machine translation, natural language generation, information retrieval, or question answering. The ...
Lire la suite >Semantic similarity is an area of Natural Language Processing that is useful for several downstream applications, such as machine translation, natural language generation, information retrieval, or question answering. The task consists in assessing the extent to which two sentences express or do not express the same meaning. To do so, corpora with graded pairs of sentences are required. The grade is positioned on a given scale, usually going from 0 (completely unrelated) to 5 (equivalent semantics). In this work, we introduce such a corpus for French, the first that we know of. It is comprised of 1,010 sentence pairs with grades from five annotators. We describe the annotation process, analyse these data, and perform a few experiments for the automatic grading of semantic similarity.Lire moins >
Lire la suite >Semantic similarity is an area of Natural Language Processing that is useful for several downstream applications, such as machine translation, natural language generation, information retrieval, or question answering. The task consists in assessing the extent to which two sentences express or do not express the same meaning. To do so, corpora with graded pairs of sentences are required. The grade is positioned on a given scale, usually going from 0 (completely unrelated) to 5 (equivalent semantics). In this work, we introduce such a corpus for French, the first that we know of. It is comprised of 1,010 sentence pairs with grades from five annotators. We describe the annotation process, analyse these data, and perform a few experiments for the automatic grading of semantic similarity.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.archives-ouvertes.fr/hal-03095142/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-03095142/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- cardon-LREC2020.pdf
- Accès libre
- Accéder au document