Skip to Main content Skip to Navigation
Conference papers

Are Neural Networks Extracting Linguistic Properties or Memorizing Training Data? An Observation with a Multilingual Probe for Predicting Tense

Abstract : We evaluate the ability of Bert embeddings to represent tense information, taking French and Chinese as a case study. In French, the tense information is expressed by verb morphology and can be captured by simple surface information. On the contrary, tense interpretation in Chinese is driven by abstract, lexical, syntactic and even pragmatic information. We show that while French tenses can easily be predicted from sentence representations, results drop sharply for Chinese, which suggests that Bert is more likely to memorize shallow patterns from the training data rather than uncover abstract properties.
Document type :
Conference papers
Complete list of metadata

https://halshs.archives-ouvertes.fr/halshs-03197072
Contributor : Guillaume Wisniewski Connect in order to contact the contributor
Submitted on : Tuesday, April 13, 2021 - 2:41:24 PM
Last modification on : Tuesday, September 28, 2021 - 5:14:54 PM
Long-term archiving on: : Wednesday, July 14, 2021 - 6:39:43 PM

File

chinese_tense.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : halshs-03197072, version 1

Citation

Bingzhi Li, Guillaume Wisniewski. Are Neural Networks Extracting Linguistic Properties or Memorizing Training Data? An Observation with a Multilingual Probe for Predicting Tense. EACL 2021, Apr 2021, Kiev (on line), Ukraine. ⟨halshs-03197072⟩

Share

Metrics

Les métriques sont temporairement indisponibles