Detmar Meurers, Niels Ott and Ramon Ziai
Proceedings of Linguistic Evidence 2010.
Corpora in linguistics and computational linguistics have traditionally been assembled from data sources such as newspaper texts, books and, more recently, the web. While these sources provide large quantities of language data, typically very little or nothing is known about the context under which the text has been produced. The only information an analysis can refer to is the text itself, e.g., when a sentence is analyzed using the preceding sentences for disambiguation. However, language is always produced in a concrete extra-linguistic context. This contextual setting includes world knowledge and situational knowledge, i.e., the aspects of world knowledge which are relevant to interpret the given text and the concrete task and situation that the language was produced for.
The notion of a task and the evaluation of language in context plays a particularly important role in foreign language teaching and learning (cf., e.g., Ellis 2003) and a representation of the learner's ability to use language in context and to perform tasks using appropriate task strategies has been argued to be crucial for learner modeling (Amaral and Meurers 2008). However, the so-called learner corpora created to document the language produced by language learners typically consist only of learner essays.
In this paper, we present our efforts at collecting a longitudinal learner corpus consisting of the answers to reading comprehension questions, including an explicit representation of the task context and learner information. After introducing the data sources and characteristics of the corpus we are collecting, we discuss the development of the open-source WELCOME tool, which we have created to facilitate the interdisciplinary exchange of the contextualized learner corpus between the language programs providing the data and the computational linguists working on its encoding and automatic analysis.
Electronically available:
Note: The electronic versions of the publications linked on this page are the last versions I had the copyright for. Where a publisher copyedited and/or typeset the papers, the electronic copies linked here are NOT identical to the officially published version, which should be used for any quotes, references to page numbers, etc.
Bibtex entry:
@inproceedings{Meurers:Ott:Ziai:09,
author = {Meurers, Detmar and Ott, Niels and Ziai, Ramon},
title = {Compiling a Task-Based Corpus for the Analysis of Learner Language in Context},
booktitle = {Proceedings of Linguistic Evidence 2010},
pages = {214--217},
address = {Tübingen},
url = {http://purl.org/dm/papers/meurers-ott-ziai-10.html},
year = {2010}
}