Анализ корпусов текстов террористической и антиправовой направленности

Abstract : The purpose of the study in the development of a technique of creation and automatic analysis of special corpora for their subsequent application as the training datasets and detecting the differentiating characters in problems of text classification. The method is to use the analysis tools provided by the TXM platform expanded with new procedures of calculation of additional characteristics of texts, such as combinations of letters, pseudo-stems, noun phrases and verb phrases. As a results, it is shown that the developed extenders of the case TXM platform allow to solve effectively problems of the analysis of texts of special subject, the created corpus of extremist subject can be used as the training selection for problems of classification of texts, the conclusion about use of combinations of letters as the universal differentiating characters along with classical linguistic characteristics of texts is drawn.
Complete list of metadatas

https://halshs.archives-ouvertes.fr/halshs-02266136
Contributor : Alexei Lavrentiev <>
Submitted on : Tuesday, August 13, 2019 - 1:52:29 PM
Last modification on : Wednesday, August 14, 2019 - 1:20:08 AM

Identifiers

Citation

Alexei Lavrentiev, Ivan Smirnov, Margarita Suvorova, Fedor Solovyev, Alina Fokina, et al.. Анализ корпусов текстов террористической и антиправовой направленности. Voprosy kiberbezopasnosti, NPO Eshelon, 2019, pp.54-60. ⟨10.21681/2311-3456-2019-4-54-60⟩. ⟨halshs-02266136⟩

Share

Metrics

Record views

10