Prof. Detmar Meurers: ISCL Hauptseminar (Wintersemester 2008/2009)
Exploring the Automatic Analysis of Learner Language
Abstract:
The availability of linguistically annotated corpora is
supporting important empirical insights into how language works, and it
has become essential for training and testing human language technology.
The language produced by second language learners differs in significant
ways from that produced by native speakers. A systematic, corpus-based
analysis of the over- and underuse of specific constructions or the
occurrence of errors that are typical for particular learner populations
can help answer questions on how languages are acquired and support the
development of human language technology that can analyze learner
language, for example to provide feedback in Intelligent Computer-Aided
Language Learning (ICALL) applications.
In this seminar we will discuss research on the development of
annotation schemes for learner language, with a particular focus on what
distinctions can be reliably annotated and how natural language
processing tools can be adapted or created to automate such analysis for
efficient corpus annotation and ICALL applications.
Instructor: Detmar Meurers
- Office: Room 1.28, Blochbau (Wilhelmstr. 19)
- Email: dm@sfs.uni-tuebingen.de
- Office hours: Tuesdays 14:00-15:00
Course meets:
- Tuesdays, 10ct-12, Seminarraum 1.13, Blochbau (Wilhelmstr. 19)
- Thursdays, 10ct-12, Seminarraum 1.13, Blochbau (Wilhelmstr. 19)
- Fridays, 12ct-14, Computerlabor 2.26, Blochbau (Wilhelmstr. 19)
Online materials: We will be using the new
department Moodle site for the course, which is accessible at
http://courses.sfs.uni-tuebingen.de. You will access it to
locate the updated syllabus, slides, pointers to reading material, and
to post questions and participate in the discussion.
The first time you visit the department Moodle, you will need to
create an account. To do so, select ``Create new account'' and enter
your department user id as id, pick a new password, (do not
use the same password for Moodle as for your department account) and
enter your department email address
(i.e. YOUR-ID@sfs.uni-tuebingen.de) as your email address. If you do
not yet have a department account, please contact me asap.
For questions concerning the department accounts and computer system,
you can contact the system administrator Jochen Saile. His office
hours are: Thursdays 9-11 in room 2.25, Blochbau (Wilhelmstr. 19),
email: saile@sfs.uni-tuebingen.de, phone: 29-78487.
Relatedly, we will at times send you email related to our class.
Please be sure to read email sent to your department account at least
once a day. You can ask Jochen Saile to forward your department email
to another account that you read regularly.
Nature of course and my expectations: This is a
research-oriented seminar, i.e., each participant is expected to take
an active role in exploring the topic. More concretely, each
participant is expected to
- regularly and actively participate in the class and lab
discussion, read the papers assigned by me or the presenters and
post a question on Moodle to the``Reading Discussion Forum'' on each
reading at the latest the day before it is discussed in
class. (30% of grade)
Note: Following the rules of the Neuphilologische Fakultät,
missing more than two meetings unexcused, automatically results in
failing the class.
- explore and present a topic (30% of grade):
- select a topic by the end of October
- thoroughly research the topic, taking my literature pointers
as a starting point
- prepare the presentation with slides and discuss the
presentation with me during my office hours in the week
before the presentation
- after our meeting, start a new Moodle thread on the ``Reading
Discussion Forum'' specifying what every course participant should
read to prepare for your presentation at least a week
before your presentation
- present the topic in class
- in groups of two design, apply, and document a learner language
annotation scheme and submit this as your ``Hausarbeit''
before the beginning of the next semester (40% of grade):
- define an annotation scheme
- document the distinctions and special cases in an annotation manual
- individually (!) annotate a small learner corpus I will hand
out after the Christmas break, compare the two annotations to
obtain inter-annotator agreement figures, attempt to resolve
conflicts, revise the annotation manual, and document the
difficulties
- hand in the annotation manual as ``Hausarbeit'' before the
next semester starts.
If you are in the fifth semester of the BA in ISCL and you want to
do an oral or a written exam, let me know before the Christmas break.
Academic conduct and misconduct: Research is
driven by discussion and free exchange of ideas, motivations, and
perspectives. So you are encouraged to work in groups, discuss, and
exchange ideas. At the same time, the foundation of the free exchange
of ideas is that everyone is open about where they obtained which
information. Concretely, this means you are expected to always make
explicit when you've worked on something as a team - and keep in mind
that being part of a team always means sharing the work! For text you
write, you always have to provide explicit references for any ideas or
passages you reuse from somewhere else. Note that this includes text
``found'' on the web, where you should cite the url of the web site in
case no more official publication is available.
Class etiquette: Please do not read or work on
materials for other classes in class. When in the computer lab, only
use the computers when you are asked to do a specific activity - do
not read email or browse the web. Please come to class on time and do
not pack up early. All portable electronic devices such as cell
phones should be switched off for the entire length of the flight,
oops, class. If for some reason, you must leave early or you have an
important call coming in, or you have to miss class for an important
reason, please let me know before class.
Topics:
- until 13.11. DETMAR:
- Introduction and Annotating Learner Corpora: A CL perspective
- Corpora and Corpus Annotation (Leech, 2004)
- 18./25.11. ALEKS: How does one obtain correct and consistent annotation?
(Brants & Skut, 1998; Passonneau, 1997; Artstein & Poesio, 2009)
- Learner Corpora:
- 27.11./2.12. MAGDALENA: Analyzing learner language (Ellis, 1994, ch.2), (Gass & Selinker, 2001, ch.2), (Nesselhauf, 2004)
- 4./9.12. MARIJA: Computer-assisted error analysis
(Tono, 2000; Granger, 1998; Dagneaux et al., 1998; Granger, 2004)
- 11./16./18.12.PAOLINA, CHRIS: CHILDES annotation scheme and tools (http://childes.psy.cmu.edu/)
- Annotated Learner Corpora and Annotation Schemes:
- 13./15.1.YANA: Overview (Tono, 2003; Díaz-Negrillo &
Fernández-Domínguez, 2006) and Non-Native Corpus of English (NOCE) (Díaz-Negrillo, 2007)
- 20.1. SASCHO: Cambridge Learner Corpus (CLC)
(Nicholls, 2003) and Hong Kong University of Science and
Technology Corpus of Learner English (HKUST)
(Milton & Chowdhury, 1994)
- 22.1.KATYA: Chinese Learners of English Corpus (CLEC)
- 27.1. PETER: Standard Speaker Test (SST) and NICT Japanese Learner Corpus of English (NICT-JLE)
(Izumi, 2006; Izumi et al., 2003,2005,2004)
- 3.2. EKATERINA KOCHMAR: FRench Interlanguage DAtabase (FRIDA) (Granger, 2003)
- 5.2. RUTH BECKER: FALKO: Ein Fehlerannotiertes Korpus des Deutschen als Fremdsprache
http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/forschung/falko
- 10.2. ALEXANDRU KATANA: Augsburger Fehlerkorpus: http://www.philhist.uni-augsburg.de/lehrstuehle/germanistik/DaF/projekte/fehler/ and Error analysis in unideutsch.de (Garnier et al., 2003)
- 12.2. GUY, HANAN: Determiner and Preposition Errors in English
(Lee & Seneff, 2006; Gamon et al., 2008; Tetreault & Chodorow, 2008; Eeg-Olofsson & Knutsson, 2003; ETS, 2008; Han et al., 2006; Nagata et al., 2006)
-
Artstein, R. & M. Poesio (2009).
-
Survey Article: Inter-Coder Agreement for Computational Linguistics.
Computational Linguistics pp. 1-42.
URL http://www.mitpressjournals.org/doi/abs/10.1162/coli.07-034-R2.
-
Brants, T. & W. Skut (1998).
-
Automation of Treebank Annotation.
In Proceedings of New Methods in Language Processing. Sydney,
Australia.
-
Dagneaux, E., S. Denness & S. Granger (1998).
-
Computer-aided error analysis.
System 26(2), 163-174.
URL http://www.sciencedirect.com/science/article/B6VCH-3TN9MNX-1/2/2e434546d3bbd466fad8adb01a42f66c.
-
Díaz-Negrillo, A. (2007).
-
A Fine-Grained Error Tagger for Learner Corpora.
Ph.D. thesis, University of Jaén, Spain.
-
Díaz-Negrillo, A. & J. Fernández-Domínguez (2006).
-
Error Tagging Systems for Learner Corpora.
Revista Española de Lingüística Aplicada (RESLA) 19,
83-102.
URL http://dialnet.unirioja.es/servlet/fichero_articulo?codigo=2198610&orden=72810.
-
Eeg-Olofsson, J. & O. Knutsson (2003).
-
Automatic Grammar Checking for Second Language Learners - the Use of
Prepositions.
In Proceedings of Nodalida'03. Reykjavik, Iceland.
URL http://www.nada.kth.se/~knutsson/eegolofsson_knutsson.pdf.
-
Ellis, R. (1994).
-
The Study of Second Language Acquisition.
Oxford: Oxford University Press.
-
ETS (2008).
-
Annotating Preposition Errors. Annotation Manual Version
2.0-1.
Internal Annotation Manual used at the Educational Testing Service
(ETC).
-
Gamon, M., J. Gao, C. Brockett, A. Klementiev, W. Dolan, D. Belenko &
L. Vanderwende (2008).
-
Using Contextual Speller Techniques and Language Modeling for ESL
Error Correction.
In Proceedings of IJCNLP. Hyderabad, India.
URL http://www.mt-archive.info/IJCNLP-2008-Gamon.pdf.
-
Garnier, S., Y. Tall, S. Fissaha & J. Haller (2003).
-
Learner Corpora: Design, Development and Applications. Development of
NLP tools for CALL based on learner corpora (German as a foreign language).
In D. Archer, P. Rayson, A. Wilson & T. McEnery (eds.), Proceedings of the Corpus Linguistics 2003 Conference (CL 2003). Technical
Papers 16. Lancaster University: University Centre for Computer Corpus
Research on Language. pp. 246-252.
URL http://ucrel.lancs.ac.uk/publications/CL2003/papers/garnier.pdf.
-
Gass, S. M. & L. Selinker (2001).
-
Second Language Acquisition: An Introductory Course.
Mahwah, NJ: Lawrence Erlbaum Associates, second edition ed.
-
Granger, S. (1998).
-
Chapter 1. The computerized learner corpus: a versatile new source of
data for SLA research.
In S. Granger (ed.), Learner English on computer, London; New
York: Longman.
-
Granger, S. (2003).
-
Error-tagged learner corpora and CALL: A promising synergy.
CALICO Journal 20(3), 465-480.
-
Granger, S. (2004).
-
Computer learner corpus research: current status and future
prospects.
In C. U. & U. T. (eds.), Applied Corpus Linguistics: A
Multidimensional Perspective, Amsterdam & Atlanta: Rodopi, pp. 123-145.
URL http://cecl.fltr.ucl.ac.be/Downloads/Indianapolis&
-
Han, N.-R., M. Chodorow & C. Leacock (2006).
-
Detecting Errors in English Article Usage by Non-Native Speakers.
Natural Language Engineering 12(2), 115-129.
-
Izumi, E. (2006).
-
The NICT Japanese Learner Corpus (JLE) Corpus Project.
Tech. rep., National Institution of Information and Communications
Technology (NICT).
URL http://www2.nict.go.jp/x/x161/en/member/izumi_emi/project.html.
-
Izumi, E., K. Uchimoto & H. Isahara (2004).
-
SST speech corpus of Japanese learners' English and automatic
detection of learners' errors.
ICAME Journal 28, 31-48.
URL http://icame.uib.no/ij28/Izumi.pdf.
-
Izumi, E., K. Uchimoto & H. Isahara (2005).
-
Error Annotation for Corpus of Japanese Learner English.
In Proceedings of the Second International Joint Conference on
Natural Language Processing.
-
Izumi, E., K. Uchimoto, T. Saiga, T. Supnithi & H. Isahara (2003).
-
Automatic Error Detection in the Japanese Learners' English Spoken
Data.
In The Companion Volume to the Proceedings of 41st Annual
Meeting of the Association for Computational Linguistics. Sapporo, Japan:
Association for Computational Linguistics, pp. 145-148.
URL http://www.aclweb.org/anthology/P03-2024.
-
Lee, J. & S. Seneff (2006).
-
Automatic Grammar Correction for Second-Language Learners.
In INTERSPEECH 2006 - ICSLP.
URL http://groups.csail.mit.edu/sls/publications/2006/IS061299.pdf.
-
Leech, G. (2004).
-
Chapter 2. Adding Linguistic Annotation.
In M. Wynne (ed.), Developing Linguistic Corpora: a Guide to
Good Practice, Oxford: Oxbow Books.
URL http://ahds.ac.uk/creating/guides/linguistic-corpora/chapter2.htm.
-
Milton, J. C. P. & N. Chowdhury (1994).
-
Tagging the interlanguage of Chinese learners of English.
In Proceedings joint seminar on corpus linguistics and
lexicology, Guangzhou and Hong Kong, 19-22 June, 1993, Language Centre,
HKUST, Hong Kong. pp. 127-143.
URL http://hdl.handle.net/1783.1/1087.
-
Nagata, R., A. Kawai, K. Morihiro & N. Isu (2006).
-
A Feedback-Augmented Method for Detecting Errors in the Writing of
Learners of English.
In Proceedings of ACL-COLING-06. Sydney, Australia, pp.
241-248.
URL http://www.aclweb.org/anthology/P06-1031.
-
Nesselhauf, N. (2004).
-
Learner corpora: Learner corpora and their potential for language
teaching.
In J. M. Sinclair (ed.), How to Use Corpora in Language
Teaching, John Benjamins, pp. 125-152.
-
Nicholls, D. (2003).
-
The Cambridge Learner Corpus - error coding and analysis for
lexicography and ELT.
In D. Archer, P. Rayson, A. Wilson & T. McEnery (eds.), Proceedings of the Corpus Linguistics 2003 Conference (CL 2003). Lancaster
University: University Centre for Computer Corpus Research on Language.,
Technical Papers 16, pp. 572-581.
URL http://ucrel.lancs.ac.uk/publications/CL2003/papers/nicholls.pdf.
-
Passonneau, R. J. (1997).
-
Applying Reliability Metrics to Co-Reference Annotation.
CoRR URL http://arxiv.org/abs/cmp-lg/9706011.
-
Tetreault, J. & M. Chodorow (2008).
-
The Ups and Downs of Preposition Error Detection in ESL Writing.
In Proceedings of COLING-08. Manchester.
URL http://www.ets.org/Media/Research/pdf/r3.pdf.
-
Tono, Y. (2000).
-
A corpus-based analysis of interlanguage development: analysing POS
tag sequences of EFL learner corpora.
In PALC'99: Practical Applications in Language Corpora. pp.
323-340.
URL http://leo.meikai.ac.jp/~tono/paper/palc99.pdf.
-
Tono, Y. (2003).
-
Learner corpora: design, development and applications.
In D. Archer, P. Rayson, A. Wilson & T. McEnery (eds.), Proceedings of the Corpus Linguistics 2003 Conference (CL 2003). Technical
Papers 16. Lancaster University: University Centre for Computer Corpus
Research on Language. pp. 800-809.
URL http://ucrel.lancs.ac.uk/publications/CL2003/papers/tono.pdf.
This document was generated using the
LaTeX2HTML translator Version 2002-2-1 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 -no_navigation -html_version 4.0,latin1,unicode syllabus
The translation was initiated by Detmar Meurers on 2009-01-22
Detmar Meurers
2009-01-22