Towards a Motivated Annotation Schema of Collocation Errors in Learner Corpora

TitleTowards a Motivated Annotation Schema of Collocation Errors in Learner Corpora
Publication TypeConference Paper
Year of Publication2010
AuthorsGonzález SP, Suárez EM, Veiga NV, Margarita Alonso Ramos, Wanner L, Vincze O, Casamayor G
Conference NameSeventh International Conference on Language Resources and Evaluation (LREC'10), 19-21 May
Date Published2010
PublisherEuropean Language Resources Association (ELRA)
Conference LocationValletta, Malta
ISBN Number2-9517408-6-7
Abstract

Collocations play a significant role in second language acquisition. In order to be able to offer efficient support to learners, an NLP-based CALL environment for learning collocations should be based on a representative collocation error annotated learner corpus. However, so far, no theoretically-motivated collocation error tag set is available. Existing learner corpora tag collocation errors simply as “lexical errors” ― which is clearly insufficient given the wide range of different collocation errors that the learners make. In this paper, we present a fine-grained three-dimensional typology of collocation errors that has been derived in an empirical study from the learner corpus CEDEL2 compiled by a team at the Autonomous University of Madrid. The first dimension captures whether the error concerns the collocation as a whole or one of its elements; the second dimension captures the language-oriented error analysis, while the third exemplifies the interpretative error analysis. To facilitate a smooth annotation along this typology, we adapted Knowtator, a flexible off-the-shelf annotation tool implemented as a Protégé plugin.