Computer-assisted language comparison with RefLex

Abstract : The RefLex database ( is devoted to lexical data from African languages. So far (as of January, 2019), it contains 1,168,209 lexical entries from 1,312 sources covering 799 languages. Besides its importance as a (freely available) online Reference Lexicon (nearly all entries are linked with an image of the original page where it appears), RefLex proposes many tools designed to help the linguist perform various taslks pertaining to language comparison, language typology or historical linguistics. The way by which the tools may help are best summed up as follows: Counting; Organising; Retrieving. After a quick overview, my talk will present some of these tools: 1. Counting: the Statistical tool. It allows to count all kind of combinations in a given field, and to display the result as a contingency table where the values that are well above or well under the expected ones are highlighted. The statistical tool is also able to count combinations of values in two different fields (as, for instance, Part of Speech and Tone Pattern). As a side effect, this tool is also very useful as an error finder, be it errors in the original source or in the coding of the data. Examples will show how, for instance, specific counts can be helpful in suggesting regular sound changes. 2. Organising: the Reconstruction tool. It consists of a set of user-friendly interfaces designed to select sources, then to select words, then to align words phonetically and finally to manage the correspondence sets thus created. There is nothing automatic here: the choice of cognates as well as the details of phonetic alignments are left to the linguist alone, but the interface makes it very easy to handle. In addition, it is also possible for several registered users to work collaboratively on the same dataset. 3. Retrieving: the Loanword tool. Currently under development, this very specific tool will take advantage of the fact that whenever the information is available in the original source, the borrowing status of words is hard-coded in the database. In addition, it is possible for any user to add borrowing information to any record. So far, more than 20,000 entries are identified as loanwords. The Loanword tool will make it possible to know what are the prominent donor languages, which notions are borrowed most and where, and other similar facts. It will of course make use of the mapping features already present in RefLex. These three features of RefLex illustrate how an online lexical database can contribute to a better knowledge of language history, by going far beyond the mere display of lexical material.
