Mesures et savoirs : Quelles méthodes pour l’histoire culturelle à l’heure du big data ?

Abstract : Quantitative analysis of cultural history has begun with the appearance of massive open-source data, such as Google Books, and has been renown as "cultural economicsˮ. It is now open to researchers and literary critics, thus allowing to have access to cultural facts and their evolution through textual marks within digitalized data. Those massive corpora cannot be analyzed blindly as they may not all be equipped with substantial metadata, or might, in worst case scenarios, be very noisy. For massive corpora, that is to say with billions of words, common visualization tools such as Voyant Tools or TXM, and the methods those softwares use to analyze data, cannot be reliably efficient. Within the margins of a project about literary History, between the Labex OBVIL and the Stanford Literary Lab, aiming at defining literature as a word, concept and semantic field, and at drawing an empirical history of literature, we analyzed 1618 French books, that is to say a 140 million word corpus, from the end of the "Ancien Régime" up to the Second World War. To do so, we used different experimental text mining techniques, combining distant and close reading analysis. In this article, we shall explore different kinds of text mining, such as (frequencial) closed measures, unsupervised machine analysis (topic model-ing), semi-open methods (collocations), each time pointing out their benefits and drawbacks. We shall then demonstrate how necessary it is to apply to a deeper and more precise text mining, using substantial metadata, such as lemmatized data, syntactical structure and semantic analysis (such as word vectors). We shall in the end demonstrate how a substantial study of big literary corpora cannot disjoint distant and close reading, as both tend to prove or contradict one another in a most effective way for producing evolutive representations of the history of literature.
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download

https://halshs.archives-ouvertes.fr/halshs-02430078
Contributor : Alexandre Gefen <>
Submitted on : Monday, January 13, 2020 - 12:07:26 PM
Last modification on : Tuesday, January 14, 2020 - 1:43:50 AM

File

 Restricted access
To satisfy the distribution rights of the publisher, the document is embargoed until : 2021-01-07

Please log in to resquest access to the document

Identifiers

Collections

Citation

Marianne Reboul, Alexandre Gefen. Mesures et savoirs : Quelles méthodes pour l’histoire culturelle à l’heure du big data ?. Semiotica, De Gruyter, 2019, 2019 (230), pp.97-120. ⟨10.1515/sem-2018-0103⟩. ⟨halshs-02430078⟩

Share

Metrics

Record views

33