Linguistic Modeling and its Interfaces
Oberseminar, Detmar Meurers, Winter Semester 2013/2014

This series features presentations and discussions of current issues in linguistic modeling and its interfaces. This includes linguistic modeling in computational linguistics, language acquisition research, Intelligent Computer-Assisted Language Learning, and education, as well as theoretical linguistic research with a focus on the interfaces of syntax and information structure. It is open to anyone interested in this interdisciplinary enterprise.

When: Fr 10ct-12
Where: Room 1.13 (Blochbau, Wilhelmstr. 19)
Mailing list for related announcements: http://mailman.sfs.uni-tuebingen.de/cgi-bin/mailman/listinfo/icall

A list of talks in previous semesters can be found here: Summer 13, Winter 12/13, Summer 12, Winter 11/12, Summer 11, Summer 10, Winter 09/10, Summer 09

Sessions

October 25: Philip Schulz
Focus and Information
Abstract: In this talk I will present a new approach to focus, and more generally information structure, that I developed in my Master’s thesis. Focus has traditionally be seen as a grammatical notion (although most people would be hard pressed when asked what they actually mean by the term ”grammatical”). Recent research indicates that language processing is influenced by unpredictability as measured through surprisal (a.k.a. Shannon-Information). In short, it appears that if a word is unpredictable it will be pronounced with longer duration and it will also be harder to react to. Focus on the other hand has the nice property that it lengthens focused words, hence allowing for more processing time. I therefore attempt to (partially) tie focus to unpredictability. This will happen under the assumption that language is a form of human behaviour. This in turn implies that it has social impact. In a large-scale online study I tested how people would rate social variables if focus placement is manipulated in question-answer dialogues. The results indicate that focus does indeed affect the dimensions of friendliness and sincerity. Interestingly, if we are willing to take information theory seriously and interpret information as a means of uncertainty reduction we can integrate the social component of focus with its predictability component. In my talk I will outline how this can be achieved.
November 1: All Saints’ Day (no meeting)
November 8: Detmar Meurers:
Update on Conferences, Projects, and Plans
November 15: Invited talks supported by SFB 833, Project A4:

Andrea Horbach
Reducing supervision in scoring short answer exercises
Abstract: In this talk I will present our work on Computer-Assisted Language Learning with the aim of reducing teacher effort in grading short answer exercises. We present two strands of work: one using the CREG corpus and the other using data that has been collected during placement tests for learners of German as a Foreign Language at Saarland University.
Short answer exercises such as reading or listening comprehension questions are a common assessment strategy in foreign language learning, and it seems intuitively likely that students would make use of the reading text in constructing answers. First, we discuss an annotation study which explores the question of whether this intuition is reflected in data from the CREG corpus. We find that instructor-supplied target answers as well as correct student answers often link to the same portion of the text, while incorrect student answers often refer to passages of the text that have nothing to do with the correct answer.
Next, we evaluate whether these findings can be leveraged for automatic short answer scoring. We build a very simple classifier that relies solely on whether the student answer relates to the same passage of the reading text as the target answer. This classifier achieves performance below the state-of-the-art, but at the same time it suggest possibilities for developing automatic answer scoring systems that need less supervision from instructors.
Third, I will discuss ongoing joint work with Magdalena Wolska in which we explore the effectiveness of clustering student answers for reducing teachers’ effort in a manual grading scenario. Using data from Saarland University placement tests, we simulate a grading scenario which assumes that a teacher only labels one answer per cluster. We find that labeling on average 40% of the student answer types is enough to reach an accuracy of 90%.
In future work we will consider how to best integrate these theoretical findings into teacher interfaces and real grading scenarios.

Alexis Palmer (Universität des Saarlandes)
Active learning in the real world

Abstract: Following on the previous talk, I will discuss our next approach to reducing the amount of supervision needed for short answer scoring: namely, active learning. Active learning (AL) is a specialized machine learning scenario in which the (machine) learner guides the selection of examples to be annotated. The aim is to maximize the usefulness of human annotation effort while achieving sufficient classification accuracy. Though there is a significant literature on the use of AL for natural language processing, relatively few studies have considered what happens when AL is applied in real-world annotation settings.
In this talk, I will discuss results from a study of the effectiveness of AL in the real-world context of documenting the Mayan language Uspanteko, in which we find that the most appropriate way of combining machine and human resources depends to some extent on the expertise of the human annotator. I will then show our first steps toward implementing AL in the short answer scoring context.
November 22: Willy Cheung (Universität Tübingen):
Improving Alignment in Short Answer Evaluation
Michael Hahn (Universität Tübingen):
Distributional Semantics of Phrasal Units
November 29: Detmar Meurers:
From Recording the Past to Predicting the Future? On the Role and Relevance of Linguistic Abstraction for Corpus-Based Analysis
Abstract: The availability of large corpora of written and spoken language has significantly enriched the empirical foundation of linguistic research. At the same time, it arguably is refocusing language-related research towards questions which can readily be addressed by observing surface evidence, such as which words (co)occur, with which frequencies, in which contexts. To step from an investigation of past language use towards predictions generalizing across language tasks and domains, the annotation of corpora with abstract linguistic properties serves an important role. The talk explores the role and relevance of systematic corpus annotation using case studies from the analysis of learner corpora, records of language produced by second language learners.
(dry run of invited talk at Herrenhausen Conference: “(Digital) Humanities Revisited – Challenges and Opportunities in the Digital Age”)
December 6: no meeting due to Herrenhäuser Conference “(Digital) Humanities Revisited - Challenges and Opportunities in the Digital Age”
December 13: (talk moved to January 24 because speaker is sick)
January 10: Marcus Callies (Universität Bremen)
What learner corpora can reveal about effects of cross-linguistic influence in L2 word formation
Abstract: Word formation is one of the major mechanisms for the expansion of the vocabulary in a language. Knowledge of lexical morphology (and derivation/affixation in particular) includes information about a word’s morphological complexity, the meaning and syntactic function of affixes, and the restrictions that govern the attachment of affixes to bases. In L2 acquisition, this knowledge is important for both the decoding of unknown words and the production of new words that have not yet been acquired. In addition, it can also be beneficial for the ad-hoc formation of words, e.g. when mastering problems in lexical search. Thus, knowledge of L2 derivational morphology is likely to have a positive effect on the size of receptive and productive vocabulary.
Surprisingly, though, there is comparatively little empirical research on L2 learners’ productive use of derivational morphology (Lessard and Levison 2001, Schmitt and Zimmerman 2002, González Álvarez 2004; see Plag 2009 for review). Other research has focused on learners’ grammatical knowledge of individual affixes (Schmitt and Meara 1997, Mochizuki and Aizawa 2000), and on the usefulness of productive word formation as a strategy to facilitate vocabulary acquisition (e.g. Morin 2003) and lexical search (Zimmermann 2002).
This talk discusses the potential of learner corpora for the investigation of advanced learners’ knowledge and use of productive derivational morphology in their written L2 English. I will focus on cross-linguistic influence (CLI) and examine the following questions:
- What is the potential of learner corpora for the investigation of advanced learners’ knowledge and use of productive derivational morphology in written L2 English?
- Are there effects of CLI in and across advanced EFL varieties with different L1s?
- Which manifestations of CLI can be found?
- Do EFL learners of typologically diverse L1s show different effects of CLI in their L2s?
References
- González Álvarez, E. (2004), Interlanguage Lexical Innovation. München: LINCOM Europa.
- Lessard, G. and M. Levison (2001), “Lexical creativity in L2 French”, IRAL 39, 245-257.
- Mochizuki, M. and K. Aizawa (2000), “An affix acquisition order for EFL learners: an exploratory study”, System 28, 291-304.
- Morin, R. (2003), “Derivational morphology analysis as a strategy for vocabulary acquisition in Spanish”, Modern Language Journal 87, 200-21.
- Plag, I. (2009), “Creoles as interlanguages: word-formation”, Journal of Pidgin and Creole Languages 24:2, 339-362.
- Schmitt, N. and P. Meara (1997), “Researching vocabulary through a word knowledge framework - word associations and verbal suffixes”, Studies in Second Language Acquisition 19, 17-36.
- Schmitt, N. and C. Zimmerman (2002), “Derivative word forms: what do learners know?”, TESOL Quarterly 36:2, 145-171.
- Zimmermann, R. (2002), “Advanced foreign language vocabulary - A closer look at word formation and idioms”, in J. Arabski (ed.), Time for Words. Studies in Foreign Language Acquistion. Frankfurt/Main: Lang, 83-95.
January 17:
Ulla König-Cardanobile (Universität Tübingen)
Informationstheoretischer Ansatz zur Quantifizierung von Komplexität am Beispiel der Substantivflexion des Deutschen
Ramon Ziai (Universität Tübingen)
Update on Advancing Content Assessment in Context: The Role of Information Structure and Answer Typing
January 24: Niels Ott (Universität Tübingen)
Information Source Detection: Coarse-grained annotation
January 31: Cerstin Mahlow (Universität Konstanz)
Improving Writing Technology by Systematic Analysis of Writing Errors
Abstract: Despite today’s digital tools and writing aids, the production of well-formed, linguistically correct, stylistically adequate, and target- and audience-tailored documents is a challenge for writers; in a study of error types in native-language student writing, Lunsford and Lunsford (2008) found similar errors as Connors and Lunsford (1988) had identified in a comparable study 20 years before. The number of spelling errors had decreased dramatically, however, the texts contained similar numbers of “subject-verb agreement errors,” “missing words,” “unnecessary shifts in verb tense,” or “fused sentences,” a clear indication that such errors cannot be detected and corrected by automatic checkers.
When produced by skilled writers, these errors can be considered performance errors, typically introduced while revising and editing text, rather than competence errors. These errors should therefore be prevented by offering appropriate editing functions for writers. However, for developing such functions, we first need a clear understanding of the causes of such errors.
The concept of action slips proposed by Norman (1981) offers a very strong theoretical framework that considers both the process and the product: some failure in a procedure results in an error in the product. However, error analysis (in writing research, second language acquisition, and natural-language processing) has traditionally focused on the product, i.e., the errors visible in the finished text, but has not addressed the writing process, i.e., the editing operations that preceded the error.
I present an approach to systematically analyze complex writing errors to distinguish typos from revision errors and identify the areas where writers could benefit most from better tools.
February 7: Sabrina Wilske (dimeb, Universität Bremen)
Form and meaning in dialogue-based computer-assisted language learning
Abstract: In this talk, I will present the results of my investigation of the effect of different instructional parameters within an interactive application for computer-assisted language learning (CALL). More specifically, I examined different forms of CALL interaction and their effect on language learning. My research is motivated by existing work on two widely discussed issues within the discipline of second language acquisition. One is the debate that pits form against meaning and leads to a discussion of the extent to which language instruction should focus on linguistic forms and formal correctness as opposed to emphasizing communicative skills and the ability to use the language to make meaning in the real world. Related to that is the second controversial issue which concerns the dichotomy between implicit and explicit knowledge and learning: How explicit or implicit should instruction be, how does the degree of explicitness affect the development of explicit and implicit knowledge, and how do these two types of knowledge contribute to language skills. I will report on experiments with learners of German who practiced with one of three versions of a text-based dialog system, each of which realised a different degree of explicitness and a different weight on form versus meaning.
February 21: Markus Dickinson (Indiana University)
From Annotation Errors to Parsing Learner Language
Abstract: In this talk, I will present an overview of two strands of my research: the first half (joint work with Amber Smith) will cover some methods for detecting errors in manually- and automatically-annotated syntactic corpora (i.e., parser errors). These rather simple anomaly-finding methods tend to work well for low-resource situations; are independent of parser, language, or annotation scheme; and seem to be leading towards a connection with parse revision. The second half (joint work with Marwa Ragheb) focuses on a project of syntacticaly annotating English as a Second Language (ESL) data, the decisions we have had to make, and our first steps in automatically parsing this data. Preliminary results suggest that anomaly detection methods can help clean up our training data. Additionally, the improvement with hand-written post-processing of parse results is an encouragement to develop a grammar-based parse revision system, of a kind the first half of the talk connects with.
February 28: Marc Dras (Macquarie University)
Some Preliminary Insights into Cross-Linguistic Effects from Native Language Identification
Abstract: One motivation for the task of Native Language Identification (NLI) – attempting to detect the native language (L1) of a writer writing in a second language – is to identify which particular characteristics of the writing show effects of the L1: whether a particular pattern of article use might indicate a Chinese L1 speaker, and so on. In this talk (on joint work with Jojo Wong) I’ll be discussing two types of feature we’ve looked at in the NLI classification task, one a representation of syntactic structure, the other a collocation ’topic model’ learnt by Bayesian inference; and I’ll look at what kinds of information these sorts of features might give us about cross-linguistic effects.

_________________________________________________________________________________

Last updated: June 2, 2014