Creation of a domain ontology in CIDOC CRM OWL format using heterogeneous textual data related to industrial heritage

Kergosien, Eric; Smida, Kaouther; Cardon, Rémi; Grabar, Natalia; Wybo, Mathilde

Type de document :

Autre communication scientifique (congrès sans actes - poster - séminaire...)

Titre :

Creation of a domain ontology in CIDOC CRM OWL format using heterogeneous textual data related to industrial heritage

Auteur(s) :

Kergosien, Eric [Auteur]

Groupe d'Études et de Recherche Interdisciplinaire en Information et COmmunication - ULR 4073 [GERIICO ]
Smida, Kaouther [Auteur]
Cardon, Rémi [Auteur]
Grabar, Natalia [Auteur]

Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Wybo, Mathilde [Auteur]

Titre de la manifestation scientifique :

15th INTERNATIONAL ISKO CONFERENCE

Ville :

Porto

Pays :

Portugal

Date de début de la manifestation scientifique :

2018-07-09

Mot(s)-clé(s) en anglais :

Domain ontology construction
industrial heritage
CIDOC CROM
Text Mining
Document Analysis

Discipline(s) HAL :

Sciences de l'Homme et Société/Sciences de l'information et de la communication

Résumé en anglais : [en]

The TERRE-ISTEX project aims to provide a knowledge representation that interconnects all of these data, thanks to the semantic web technologies, in order to assist domain experts in producing and providing digital content. ...
Lire la suite >The TERRE-ISTEX project aims to provide a knowledge representation that interconnects all of these data, thanks to the semantic web technologies, in order to assist domain experts in producing and providing digital content. The originality of the project is to adopt a multidisciplinary approach to provide stakeholders, experts and non-experts, help them in the discovery of knowledge specific to their heritage, thanks to the extraction, structuring and visualization of knowledge from heterogeneous digital corpora. According to UNESCO, which has contributed significantly to the definition of the heritage (UNESCO, 1954, 1970, 1982), and then to The International Committee for the Conservation of Industrial Heritage (TICCIH, 2003), the industrial heritage can be defined as: • Material assets: buildings, machinery, equipment, workshops, factories, processing and refining sites, shops, production centers and social activities related to the textile industry; • Immaterial assets: memories, events, festivals, collective images, intellectual production transmitted by know-how which can be a succession of gestures dictated and displayed in production centers. In our work, the main efforts are focused on modeling of the domain stakeholders, the spatial entitiesand thematic, which belong to both of the assets. A three step methodology for semi-automatic building of semantic representation of the studied domain from thousands heterogeneous documents Experiments Ontology instantiation Main goal: to provide a knowledge representation based on heterogeneous data related to the industrial heritage Evaluation of spatial entity annotation on 10 articles from the French corpus Evaluation of spatial entity annotation on 10 articles from the English corpus 1. We collect and formalize the history through interviews with stakeholders. In addition to the collected information, we also exploit the Gephi tool to analyse stakeholders relations 2. identification and extraction of information related to industrial cultural heritage from heterogeneous textual documents : à Combining lexicon projection with text mining methods to improve the identification of relevant data. • Lexicon of spatial Entities (regional municipalities) • Lexicon of the domain's stakeholders (step1) • Thematic lexicon: combines (1) several existing specialized resources (Joconde created by French museums, Rameau created by the National Library of France, Wiktionnary) and a Text mining approach based on the Word2vec algorithm in order to identify of new terms from the processed corpus Local government (textual records, XML index, etc.) Libraries (images, texts, XML index, etc.) Museums (images, texts, xml index, etc.) Method: Information extraction method for creation of the ontological database Extract of the domain ontology based on four heterogeneous documents using the Protege Software (Musen et al., 1995)Lire moins >

Langue :

Anglais

Comité de lecture :

Oui

Audience :

Internationale

Vulgarisation :

Non

Collections :

Groupement d'Études et de Recherche Interdisciplinaire en Information et Communication (GERiiCO) - ULR 4073

Source :

Harvested from HAL

Fichiers

https://halshs.archives-ouvertes.fr/halshs-01968320/document
Accès libre
Accéder au document

https://halshs.archives-ouvertes.fr/halshs-01968320/document
Accès libre
Accéder au document

https://halshs.archives-ouvertes.fr/halshs-01968320/document
Accès libre
Accéder au document

document
Accès libre
Accéder au document

PosterISKOV1.pdf
Accès libre
Accéder au document

Creation of a domain ontology in CIDOC CRM ... BibTeX CSV Excel RIS

Fichiers

Creation of a domain ontology in CIDOC CRM ...

BibTeX

CSV

Excel

RIS