Collaboratively Producing Interoperable Ontologies and Semantically Annotated Corpora: the project

Francesco Beretta 1, 2
Abstract : The Digital History department (Pôle histoire numérique) of the LARHRA laboratory in Lyon has developed since ten years the project (Système modulaire de gestion de l’information historique), a method and a platform to collaboratively produce structured data and use them for semantically annotate TEI encoded texts. The aim of the project is not only to connect individual historical research and data production with a collectively managed data repository, but also to interlink the platform’s data to those published by other data providers, e.g. authority files of national libraries, museums and other cultural heritage institutions, and to format them according to widespread standards, like the CIDOC-CRM. In this way the data will be available, interoperable and reusable for new platform-internal and external research projects, and for the public. In the first part of my talk, I will describe the method the project has adopted to collaboratively develop and maintain an ontology for historical data which can be indefinitely extended according to the needs of present participants and of new research projects. Further, I’ll report about the ongoing process of refining the ontology using the CIDOC-CRM modelling method. This process is aimed at developing a CRM extension for historical data that will be managed by a consortium and be opened to any interested project and to further development according to the specific needs of participant projects. In the second part, I’ll give an account of a method to semantically annotate XML encoded texts using some basic tags and properties of the TEI standard, combining them with the flexibility and richness of an ontology for historical data. The workflow integrates the corpus analysis environment TXM for exploring the text from a linguistic perspective before annotating it semantically with the project ontology. I’ll then outline how this method allows to analyse the terminology of a historical text corpus and collaboratively manage a conceptual thesaurus.
Complete list of metadatas
Contributor : Francesco Beretta <>
Submitted on : Wednesday, June 14, 2017 - 10:10:41 PM
Last modification on : Thursday, February 7, 2019 - 3:47:35 PM


Distributed under a Creative Commons Attribution - ShareAlike 4.0 International License


  • HAL Id : halshs-01539489, version 1



Francesco Beretta. Collaboratively Producing Interoperable Ontologies and Semantically Annotated Corpora: the project. Third International Workshop on Semantic Web for Scientific Heritage, May 2017, Portoroz, Slovenia. ⟨halshs-01539489⟩



Record views


Files downloads