Skip to Main content Skip to Navigation
Journal articles

De la segmentation dans les tweets : signes de ponctuation, connecteurs, émoticônes et émojis

Abstract : In this paper, relying on a corpus of 3,444,075 tweets corresponding to 44 107 210 tokens (words, signs of punctuation, emojis, emoticons, etc.) collected in December 2016, we focus on segmentation processes at work in tweets. After mentioning some characteristics of these particular writings, we review the general segmentation processes in writing, punctuation and connectors. We then look at how these processes operate in tweets. Finally, we show that emoticons and emojis are specific processes allowing users to diversify their segmentation strategies (and other digital writings, such as SMS and email).
Document type :
Journal articles
Complete list of metadatas

https://halshs.archives-ouvertes.fr/halshs-02476998
Contributor : Jean-Philippe Magué <>
Submitted on : Thursday, February 13, 2020 - 10:04:11 AM
Last modification on : Tuesday, May 12, 2020 - 3:56:13 PM

Identifiers

  • HAL Id : halshs-02476998, version 1

Collections

Citation

Jean-Philippe Magué, Nathalie Rossi-Gensane, Pierre Halté. De la segmentation dans les tweets : signes de ponctuation, connecteurs, émoticônes et émojis. Corpus, Bases, Corpus, Langage - UMR 7320, 2020, 20. ⟨halshs-02476998⟩

Share

Metrics

Record views

47