Skip to Main content Skip to Navigation
Poster communications

Creation of a domain ontology in CIDOC CRM OWL format using heterogeneous textual data related to industrial heritage

Abstract : The TERRE-ISTEX project aims to provide a knowledge representation that interconnects all of these data, thanks to the semantic web technologies, in order to assist domain experts in producing and providing digital content. The originality of the project is to adopt a multidisciplinary approach to provide stakeholders, experts and non-experts, help them in the discovery of knowledge specific to their heritage, thanks to the extraction, structuring and visualization of knowledge from heterogeneous digital corpora. According to UNESCO, which has contributed significantly to the definition of the heritage (UNESCO, 1954, 1970, 1982), and then to The International Committee for the Conservation of Industrial Heritage (TICCIH, 2003), the industrial heritage can be defined as: • Material assets: buildings, machinery, equipment, workshops, factories, processing and refining sites, shops, production centers and social activities related to the textile industry; • Immaterial assets: memories, events, festivals, collective images, intellectual production transmitted by know-how which can be a succession of gestures dictated and displayed in production centers. In our work, the main efforts are focused on modeling of the domain stakeholders, the spatial entitiesand thematic, which belong to both of the assets. A three step methodology for semi-automatic building of semantic representation of the studied domain from thousands heterogeneous documents Experiments Ontology instantiation Main goal: to provide a knowledge representation based on heterogeneous data related to the industrial heritage Evaluation of spatial entity annotation on 10 articles from the French corpus Evaluation of spatial entity annotation on 10 articles from the English corpus 1. We collect and formalize the history through interviews with stakeholders. In addition to the collected information, we also exploit the Gephi tool to analyse stakeholders relations 2. identification and extraction of information related to industrial cultural heritage from heterogeneous textual documents : à Combining lexicon projection with text mining methods to improve the identification of relevant data. • Lexicon of spatial Entities (regional municipalities) • Lexicon of the domain's stakeholders (step1) • Thematic lexicon: combines (1) several existing specialized resources (Joconde created by French museums, Rameau created by the National Library of France, Wiktionnary) and a Text mining approach based on the Word2vec algorithm in order to identify of new terms from the processed corpus Local government (textual records, XML index, etc.) Libraries (images, texts, XML index, etc.) Museums (images, texts, xml index, etc.) Method: Information extraction method for creation of the ontological database Extract of the domain ontology based on four heterogeneous documents using the Protege Software (Musen et al., 1995)
Document type :
Poster communications
Complete list of metadata
Contributor : Natalia Grabar <>
Submitted on : Wednesday, January 2, 2019 - 2:56:46 PM
Last modification on : Friday, December 11, 2020 - 6:24:04 PM
Long-term archiving on: : Wednesday, April 3, 2019 - 3:16:52 PM


Files produced by the author(s)


  • HAL Id : halshs-01968320, version 1



Eric Kergosien, Kaouther Smida, Rémi Cardon, Natalia Grabar, Mathilde Wybo. Creation of a domain ontology in CIDOC CRM OWL format using heterogeneous textual data related to industrial heritage. 15th INTERNATIONAL ISKO CONFERENCE, Jul 2018, Porto, Portugal. ⟨halshs-01968320⟩



Record views


Files downloads