Creation of a domain ontology in CIDOC CRM ...
Type de document :
Autre communication scientifique (congrès sans actes - poster - séminaire...)
Titre :
Creation of a domain ontology in CIDOC CRM OWL format using heterogeneous textual data related to industrial heritage
Auteur(s) :
Kergosien, Eric [Auteur]
Groupe d'Études et de Recherche Interdisciplinaire en Information et COmmunication - ULR 4073 [GERIICO ]
Smida, Kaouther [Auteur]
Cardon, Rémi [Auteur]
Grabar, Natalia [Auteur]
Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Wybo, Mathilde [Auteur]

Groupe d'Études et de Recherche Interdisciplinaire en Information et COmmunication - ULR 4073 [GERIICO ]
Smida, Kaouther [Auteur]
Cardon, Rémi [Auteur]
Grabar, Natalia [Auteur]

Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Wybo, Mathilde [Auteur]

Titre de la manifestation scientifique :
15th INTERNATIONAL ISKO CONFERENCE
Ville :
Porto
Pays :
Portugal
Date de début de la manifestation scientifique :
2018-07-09
Mot(s)-clé(s) en anglais :
Domain ontology construction
industrial heritage
CIDOC CROM
Text Mining
Document Analysis
industrial heritage
CIDOC CROM
Text Mining
Document Analysis
Discipline(s) HAL :
Sciences de l'Homme et Société/Sciences de l'information et de la communication
Résumé en anglais : [en]
The TERRE-ISTEX project aims to provide a knowledge representation that interconnects all of these data, thanks to the semantic web technologies, in order to assist domain experts in producing and providing digital content. ...
Lire la suite >The TERRE-ISTEX project aims to provide a knowledge representation that interconnects all of these data, thanks to the semantic web technologies, in order to assist domain experts in producing and providing digital content. The originality of the project is to adopt a multidisciplinary approach to provide stakeholders, experts and non-experts, help them in the discovery of knowledge specific to their heritage, thanks to the extraction, structuring and visualization of knowledge from heterogeneous digital corpora. According to UNESCO, which has contributed significantly to the definition of the heritage (UNESCO, 1954, 1970, 1982), and then to The International Committee for the Conservation of Industrial Heritage (TICCIH, 2003), the industrial heritage can be defined as: • Material assets: buildings, machinery, equipment, workshops, factories, processing and refining sites, shops, production centers and social activities related to the textile industry; • Immaterial assets: memories, events, festivals, collective images, intellectual production transmitted by know-how which can be a succession of gestures dictated and displayed in production centers. In our work, the main efforts are focused on modeling of the domain stakeholders, the spatial entitiesand thematic, which belong to both of the assets. A three step methodology for semi-automatic building of semantic representation of the studied domain from thousands heterogeneous documents Experiments Ontology instantiation Main goal: to provide a knowledge representation based on heterogeneous data related to the industrial heritage Evaluation of spatial entity annotation on 10 articles from the French corpus Evaluation of spatial entity annotation on 10 articles from the English corpus 1. We collect and formalize the history through interviews with stakeholders. In addition to the collected information, we also exploit the Gephi tool to analyse stakeholders relations 2. identification and extraction of information related to industrial cultural heritage from heterogeneous textual documents : à Combining lexicon projection with text mining methods to improve the identification of relevant data. • Lexicon of spatial Entities (regional municipalities) • Lexicon of the domain's stakeholders (step1) • Thematic lexicon: combines (1) several existing specialized resources (Joconde created by French museums, Rameau created by the National Library of France, Wiktionnary) and a Text mining approach based on the Word2vec algorithm in order to identify of new terms from the processed corpus Local government (textual records, XML index, etc.) Libraries (images, texts, XML index, etc.) Museums (images, texts, xml index, etc.) Method: Information extraction method for creation of the ontological database Extract of the domain ontology based on four heterogeneous documents using the Protege Software (Musen et al., 1995)Lire moins >
Lire la suite >The TERRE-ISTEX project aims to provide a knowledge representation that interconnects all of these data, thanks to the semantic web technologies, in order to assist domain experts in producing and providing digital content. The originality of the project is to adopt a multidisciplinary approach to provide stakeholders, experts and non-experts, help them in the discovery of knowledge specific to their heritage, thanks to the extraction, structuring and visualization of knowledge from heterogeneous digital corpora. According to UNESCO, which has contributed significantly to the definition of the heritage (UNESCO, 1954, 1970, 1982), and then to The International Committee for the Conservation of Industrial Heritage (TICCIH, 2003), the industrial heritage can be defined as: • Material assets: buildings, machinery, equipment, workshops, factories, processing and refining sites, shops, production centers and social activities related to the textile industry; • Immaterial assets: memories, events, festivals, collective images, intellectual production transmitted by know-how which can be a succession of gestures dictated and displayed in production centers. In our work, the main efforts are focused on modeling of the domain stakeholders, the spatial entitiesand thematic, which belong to both of the assets. A three step methodology for semi-automatic building of semantic representation of the studied domain from thousands heterogeneous documents Experiments Ontology instantiation Main goal: to provide a knowledge representation based on heterogeneous data related to the industrial heritage Evaluation of spatial entity annotation on 10 articles from the French corpus Evaluation of spatial entity annotation on 10 articles from the English corpus 1. We collect and formalize the history through interviews with stakeholders. In addition to the collected information, we also exploit the Gephi tool to analyse stakeholders relations 2. identification and extraction of information related to industrial cultural heritage from heterogeneous textual documents : à Combining lexicon projection with text mining methods to improve the identification of relevant data. • Lexicon of spatial Entities (regional municipalities) • Lexicon of the domain's stakeholders (step1) • Thematic lexicon: combines (1) several existing specialized resources (Joconde created by French museums, Rameau created by the National Library of France, Wiktionnary) and a Text mining approach based on the Word2vec algorithm in order to identify of new terms from the processed corpus Local government (textual records, XML index, etc.) Libraries (images, texts, XML index, etc.) Museums (images, texts, xml index, etc.) Method: Information extraction method for creation of the ontological database Extract of the domain ontology based on four heterogeneous documents using the Protege Software (Musen et al., 1995)Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://halshs.archives-ouvertes.fr/halshs-01968320/document
- Accès libre
- Accéder au document
- https://halshs.archives-ouvertes.fr/halshs-01968320/document
- Accès libre
- Accéder au document
- https://halshs.archives-ouvertes.fr/halshs-01968320/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- PosterISKOV1.pdf
- Accès libre
- Accéder au document