Skip to Main content Skip to Navigation
Conference papers

Separating fact and fiction: The real story of corpus use in language teaching

Abstract : Corpora have been used in language teaching for decades, especially at university level by teachers aware of research in corpus linguistics. Given the variety of corpora, tools and techniques, the actual uses to which corpora have been put have proved to be highly varied and hard to pin down. Belying the frequent lamentation of the dearth of empirical research in the field, this paper introduces a corpus of 600K words consisting of 110 research articles published in just over 20 years, all of which seek to evaluate some aspect of corpus use by L2 learners. The overall pattern that emerges is highly pragmatic, as witnessed by the dominance of corpus-based (367 occurrences in 69 papers, collocating especially with activities and approach) over corpus-driven (only 64 occurrences in 11 papers, collocating more with language and research). This might seem surprising, as the most frequently cited researcher overall is Tim Johns (305 occurrences in 77 papers), who coined the term "data-driven learning" in this context. However, it also reflects a preoccupation with classroom concerns rather than corpus-linguistic criteria: most of the other frequently cited authors in the corpus are primarily language teaching specialists rather than corpus linguists. The most frequent lemmas (minus stop-words) provide a succinct if rough overview of the field: 'Learners using data from texts in a corpus for language learning or writing through concordancing, searching for patterns of vocabulary or grammar use in actual examples to improve their knowledge of English.' This general picture is revealing as much for what is absent, including many of the key advantages generally attributed to corpus use: individualisation, constructivism, collaborative learning and noticing (and related forms) occur less than once in 10,000 words, while a maximum of two papers refer frequently (ten times or more) to concepts such as responsibility, exposure, learning styles, communicative skills and autonomy. These and other key terms are analysed in context here, but the suggestion persists that such concepts remain under-researched to date. Finally, the corpus is divided into two sections according to date of publication to identify old and new themes and the development of the research questions addressed. Inevitably, this involves a change from hard locally available to on-line corpora (including Google and the web-as-corpus), but more surprisingly perhaps a move from concordancing vocabulary for language learning towards use of corpora as an aid to writing, with increased focus on discourse. The implication is that this may represent the main use of language corpora for pedagogical purposes in the future: as a reference resource for writing rather than as a general-purpose learning aid.
Document type :
Conference papers
Complete list of metadatas

Cited literature [14 references]  Display  Hide  Download
Contributor : Alex Boulton <>
Submitted on : Friday, July 27, 2018 - 3:49:50 PM
Last modification on : Thursday, October 22, 2020 - 10:29:41 AM
Long-term archiving on: : Sunday, October 28, 2018 - 1:47:26 PM


Explicit agreement for this submission


  • HAL Id : halshs-00837807, version 1



Alex Boulton. Separating fact and fiction: The real story of corpus use in language teaching. 20 years of Eurocall: Learning from the Past, Looking to the Future, Sep 2013, Evora, Portugal. pp.51-56. ⟨halshs-00837807⟩



Record views


Files downloads