A visual approach for text analysis using multiword topics

Abstract : Topics in a text corpus include features and information; visualizing these topics can improve a user's understanding of the corpus. Topics can be broadly divided into two categories: those whose meaning can be described in one word and those whose meaning in expressed through a combination of words. The latter type can be described as multiword expressions and consists of a combination of different words. However, analysis of multiword topics requires systematic analysis to extract accurate topic results. Therefore, we propose a visual system that accurate extracts topic results with multiple word combinations. For this study, we utilize the text of 957 speeches from 43 U.S. presidents (from George Washington to Barack Obama) as corpus data. Our visual system is divided into two parts: First, our system refines the database by topic, including multiword topics. Through data processing, we systematically analyze the accurate extraction of multiword topics. In the second part, users can confirm the details of this result with a word cloud and simultaneously verify the result with the raw corpus. These two parts are synchronized and the desired value of N in the N-gram model, topics, and presidents examined can be altered. In this case study of U.S. presidential speech data, we verify the effectiveness and usability of our system.
Keywords : text analysis
Type de document :
Communication dans un congrès
EuroVis 2017, Jun 2017, Barcelona, Spain. Anna Puig Puig; Tobias Isenberg, Eurographics, EuroVis 2017 - Posters. 〈http://eurovis2017.virvig.es/〉. 〈10.2312/eurp.20171168〉
Liste complète des métadonnées

Littérature citée [19 références]  Voir  Masquer  Télécharger

https://halshs.archives-ouvertes.fr/halshs-01590990
Contributeur : Seongmin Mun <>
Soumis le : mercredi 20 septembre 2017 - 15:44:30
Dernière modification le : jeudi 11 janvier 2018 - 06:17:45

Fichier

A visual approach for text ana...
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Seongmin Mun, Guillaume Desagulier, Kyungwon Lee. A visual approach for text analysis using multiword topics. EuroVis 2017, Jun 2017, Barcelona, Spain. Anna Puig Puig; Tobias Isenberg, Eurographics, EuroVis 2017 - Posters. 〈http://eurovis2017.virvig.es/〉. 〈10.2312/eurp.20171168〉. 〈halshs-01590990〉

Partager

Métriques

Consultations de la notice

68

Téléchargements de fichiers

17