How to build a corpus for a tool-based approach to determinologisation in the field of particle physics
Résumé
This paper discusses corpus design and building issues when dealing with a complex, multidimensional linguistic phenomenon such as determinologisation. Representing this phenomenon in corpus data imposes an original reflection on both the dimensions involved in the determinologisation process and some of the essential concepts of corpus building. In particular, this paper focuses on the necessity of representing the progressive aspects of determinologisation in the corpus, i.e. through levels of specialisation and through time, and the practical issues this raises. At the same time, it will show that a representative corpus of determinologisation in a specific domain (in this case, particle physics) implies clear and objective criteria when it comes to picking individual texts. Four principles will be established to this end. The discussion will lead to the proposal of a solid text selection procedure, which ensures that the peculiarities of determinologisation in the domain of particle physics are reflected in the corpus.