Studying Socially Unacceptable Discourse Classification (SUD) through different eyes: "Are we on the same page ? "

We study Socially Unacceptable Discourse (SUD) characterization and detection in online text. We first build and present a novel corpus that contains a large variety of manually annotated texts from different online sources used so far in state-of-the-art Machine learning (ML) SUD detection solutions. This global context allows us to test the generalization ability of SUD classifiers that acquire knowledge around the same SUD categories, but from different contexts. From this perspective, we can analyze how (possibly) different annotation modalities influence SUD learning by discussing open challenges and open research directions. We also provide several data insights which can support domain experts in the annotation task.

Keywords

SUD Classification Machine Learning Deep Learning Transfer Learning Annotation Guidelines SUD Classification Machine Learning Deep Learning Transfer Learning Annotation Guidelines

Domains

Computer Science [cs] Linguistics

Complete list of metadata

Submission Type	File
Deposit type	Conference papers
Title	en Studying Socially Unacceptable Discourse Classification (SUD) through different eyes: "Are we on the same page ? "
Abstract	en We study Socially Unacceptable Discourse (SUD) characterization and detection in online text. We first build and present a novel corpus that contains a large variety of manually annotated texts from different online sources used so far in state-of-the-art Machine learning (ML) SUD detection solutions. This global context allows us to test the generalization ability of SUD classifiers that acquire knowledge around the same SUD categories, but from different contexts. From this perspective, we can analyze how (possibly) different annotation modalities influence SUD learning by discussing open challenges and open research directions. We also provide several data insights which can support domain experts in the annotation task.
Authors	Bruno Machado Carneiro ¹ , Michele Linardi ^{2, 3} , Julien Longhi ^{4, 5, 6} 1 ENSEA - Ecole Nationale Supérieure de l'Electronique et de ses Applications ( 265277 ) - 6 avenue du Ponceau - CS 20 707 Cergy - 95014 Cergy-Pontoise Cedex - France 2 CY - CY Cergy Paris Université ( 1003413 ) - 33 boulevard du port, 95015 Cergy-Pontoise Cedex - France 3 ETIS - UMR 8051 - Equipes Traitement de l'Information et Systèmes ( 1003474 ) - 6, avenue du Ponceau. F 95014 CERGY-PONTOISE CEDEX - France Ecole Nationale Supérieure de l'Electronique et de ses Applications ( 265277 ) ; Centre National de la Recherche Scientifique UMR8051 ( 441569 ) ; CY Cergy Paris Université ( 1003413 ) 4 IUF - Institut universitaire de France ( 56663 ) - Maison des Universités 103 Boulevard Saint-Michel 75005 Paris - France Ministère de l'Education nationale, de l’Enseignement supérieur et de la Recherche ( 301855 ) 5 IDHN - Institut des Humanités numériques ( 1003543 ) - 8 Boulevard de l'Oise 95000 CERGY - France Equipes Traitement de l'Information et Systèmes UMR 8051 ( 1003474 ) ; Ecole Nationale Supérieure de l'Electronique et de ses Applications ( 265277 ) ; Centre National de la Recherche Scientifique UMR8051 ( 441569 ) ; CY Cergy Paris Université ( 1003413 ) ; Lexiques, Textes, Discours, Dictionnaire - Centre Jean Pruvost EA 7518 ( 1003481 ) ; CY Cergy Paris Université EA 7518 ( 1003413 ) ; Laboratoire Mobilités, Réseaux, Territoires, Environnements EA 4112 ( 1003483 ) ; CY Cergy Paris Université EA4113 ( 1003413 ) ; Laboratoire AGORA EA 7392 ( 1003493 ) ; CY Cergy Paris Université EA7392 ( 1003413 ) 6 AGORA - EA 7392 - Laboratoire AGORA ( 1003493 ) - Université de Cergy-Pontoise - Chênes I - 33 Boulevard du port 95011 Cergy - France CY Cergy Paris Université EA7392 ( 1003413 )
Audience	International
Peer-reviewed	Yes
Science popularization	No
Conference or book title	International Conference on CMC and Social Media Corpora for the Humanities
Start conference date	2023-09-14
End conference date	2023-09-15
City	Mannheim, Germany
Country	Germany
Proceedings	No
Fulltext language	English
Licence	Attribution
Publication date	2023-09-28
Invited	No
Domain	Computer Science [cs] Humanities and Social Sciences/Linguistics
European project(s)	ARENAS - ARENAS Analysis of and Responses to Extremist Narratives (Horizon Grant agreement ID: 101094731) CORDIS number : ID: 101094731
Keywords	en SUD Classification, Machine Learning, Deep Learning, Transfer Learning, Annotation Guidelines, SUD Classification Machine Learning Deep Learning Transfer Learning Annotation Guidelines

Main file

CMC-Carneiro_Linardi_Longhi.pdf ( 809.8 Ko )

Origin : Files produced by the author(s)

Michele Linardi : Connect in order to contact the contributor

https://hal.science/hal-04316521

Submitted on: Monday, December 4, 2023 at 5:25:37 PM

Last modification on: Monday, April 15, 2024 at 11:25:23 AM

Dates and versions

hal-04316521, version 1 (04-12-2023)

Licence

Attribution - CC BY 4.0

Identifiers

HAL Id : hal-04316521 , version 1

Cite

Bruno Machado Carneiro, Michele Linardi, Julien Longhi. Studying Socially Unacceptable Discourse Classification (SUD) through different eyes: "Are we on the same page ? ". International Conference on CMC and Social Media Corpora for the Humanities, Sep 2023, Mannheim, Germany, Germany. ⟨hal-04316521⟩

Export

BibTeX TEI Dublin Core DC Terms EndNote Datacite

Collections

CNRS UNIV-CERGY ETIS ETIS-MIDI CY-TECH-SM AGORA-CY CY-ART-HUMANITES IDHN

31 View

36 Download

Last update date on 5/12/24