Linguistic Modeling and its Interfaces
Oberseminar, Detmar Meurers, Winter Semester 2014/2015

This series features presentations and discussions of current issues in linguistic modeling and its interfaces. This includes linguistic modeling in computational linguistics, language acquisition research, Intelligent Computer-Assisted Language Learning, and education, as well as theoretical linguistic research with a focus on the interfaces of syntax and information structure. It is open to anyone interested in this interdisciplinary enterprise.

When: Fr 10ct-12
Where: Room 1.13 (Blochbau, Wilhelmstr. 19)
Mailing list for related announcements:
http://mailman.sfs.uni-tuebingen.de/cgi-bin/mailman/listinfo/ling-mod-info
Moodle page (for discussion forum and materials restricted to the university of Tübingen):
https://moodle02.zdv.uni-tuebingen.de/course/view.php?id=988
A list of talks in previous semesters can be found here:
Summer 14, Winter 13/14, Summer 13, Winter 12/13, Summer 12, Winter 11/12, Summer 11, Summer 10, Winter 09/10, Summer 09

Sessions

October 24. New semester kick-off and Stefanie Wolf (SFB 833, Universität Tübingen):

SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation (Hill, Reichart, Korhonen, 2014)
October 31. Martí Quixal (Universität Tübingen):

Language Learning Tasks and Automatic Analysis of Learner Language. Connecting FLTL and NLP in the design of ICALL materials supporting effective use in real-life instruction

Abstract: We argue that the successful integration of ICALL materials demands a design process considering both pedagogical and computational requirements as equally important. Our investigation pursues two goals. The first one is to integrate into task design insights from Second Language Acquisition and Foreign Language Teaching and Learning with insights from computational linguistic modelling. The second goal is to facilitate the integration of ICALL materials in real-world instruction settings, as opposed to research or lab-oriented instruction settings, by empowering teachers with the methodology and the technology to autonomously author such materials.
To achieve the first goal, we propose an ICALL material design process that combines basic principles of Task-Based Language Instruction and Task-Based Test Design with the specification requirements of Natural Language Processing. The relation between pedagogical and computational requirements is elucidated by exploring (i) the formal features of foreign language learning activities, (ii) the complexity and variability of learner language, and (iii) the feasibility of applying computational techniques for the automatic analysis and evaluation of learner responses.
To achieve the second goal, we propose an automatic feedback generation strategy that enables teachers to customise the computational resources required to automatically correct ICALL activities without the need for programming skills. This proposal is instantiated and evaluated in real world-instruction settings involving teachers and learners in secondary education.
November 7. Esther Kutter (Cognitive Science, Universität Tübingen):

Visual Enhancement of the Web (VIEW): Integration of Age of Acquisition and Word Frequency Measures
(Internship Report)

Eduard Schaf (Computational Linguistics, Universität Tübingen):

Visual input enhancement of Russian web pages with Finite-State Transducer Technology
(BA Thesis report)

also note: SfS Colloquium Festival in the afternoon
November 14. no meeting (Detmar in Uppsala)
November 21. Simón Ruiz (LEAD, Universität Tübingen):

On the impact of different computer-mediated textual enhancement conditions on L2 learners’ acquisition of article

The talk presents a study examining the effectiveness of computer-mediated textual enhancement on second language learners’ implicit and explicit knowledge of English article use. Learners logged into a visual input enhancement system, WERTi, which provides learners with authentic news texts and modifies all the English articles in the texts through different enhancement techniques. Students in Lancaster university were assigned to five different enhancement conditions: colorizing, clicking, multiple choice, fill-in-the-blanks, and control group. The first four groups were provided with enhanced texts whereas the control group read the unenhanced texts. A pretest and a posttest were conducted immediately before and after the two-week treatment, and during this period, participants were asked to read 30 news texts through WERTi. Four tests, which were adapted from Akakura (2009), measured acquisition: elicited production test, oral production test, grammaticality judgment test and metalinguistic knowledge test. The results showed that participants in the fill-in-the-blanks group developed implicit knowledge of English article use. Furthermore, a positive effect of multiple choice enhancement conditions was found on the measures of explicit knowledge. Those findings contributed to the effect of computer-mediated FonF, and have pedagogical implications on learners’ self-autonomy learning.
The talk builds on the MA thesis research of Wenjing Li and reports on a collaboration with José Luis Moreno, Sarah Grey, Nicole Ziegler, Detmar Meurers and Patrick Rebuschat.
November 28. Serhiy Bykh (Universität Tübingen):

Native Language Identification: Background and Current Research Update

Native Language Identification (NLI) tackles the problem of determining the native language of an author based on a text the author has written in a second language (L2). The research has theoretical relevance, e.g., it contributes to the understanding of the L1 transfer in SLA, as well as practical relevance, e.g., for language learning, author profiling and security applications. NLI started to attract attention in computational linguistics with the work of Koppel et al. (2005). Since then, the interest has increased steadily, leading to the First NLI Shared Task in 2013, with 29 participating teams from around the globe (Tetreault et al., 2013). The talk provides an introduction to the field by briefly sketching its development and presenting an overview of the current issues and trends.
December 5. Bernd Bohnet (Google, University of Birmingham):

Joint Morphological and Syntactic Analysis for Richly Inflected Languages

Over the past decade a variety of graph-based and transition-based parsing approaches have been developed. Yet, none of these approaches integrated parsing with part of speech tagging. By contrast, conventional phrase structure parser have for a long time included part-of-speech tagging as an integrated component.
In this talk, I will present our latest dependency-parsing system that combines part-of-speech tagging, morphological and syntactic analysis. Starting off with a transition-based model for joint part-of-speech tagging and dependency parsing, we explored ways of integrating morphological features into the model. We also investigated the use of rule-based morphological analysers and of word clusters to tackle the sparsity of lexical features. I will present an evaluation of the model’s performance on five morphologically rich languages. We show consistent improvements for our joint parsing over a conventional pipeline model with regard to both morphological and syntactic accuracy. Our model obtain performance results that go beyond the current state of the art in dependency parsing for this set of languages.
December 12 ⇒Treebanks and Linguistic Theory
December 19. Linguistic Modeling Café
January 9. Johanna Heininger:

The Influence of Text Length on Latent Semantic Analysis
(BA Thesis report)

Abstract: This thesis examines the influence of text length on Latent Semantic Analysis (LSA), which is a statistical technique based on using the dis- tributional properties of words. With an LSA tool both essays and short answers were scored and the correlations with the scores of human raters were calculated. For source dependent essays the correlations of the LSA scores with the scores of human raters decreased as expected, when the essays were shortened. Furthermore the question whether the LSA scores for short answers improve the accuracy of an ensemble classifier is discussed and explored.
January 16. Kordula De Kuthy, Ramon Ziai and Detmar Meurers (University of Tübingen):

Analyzing focus in authentic data from an explicit task context

Abstract: Focus has been discussed extensively in the theoretic literature, but these theories have rarely been tested on authentic data. Following substantial annotation efforts in two authentic task based corpora, we encountered a number of issues that are problematic for current theories. In this talk, we will present data from the CREG corpus, a collection of reading comprehension exercises, and Stuttgart 21, a spoken language corpus. We will discuss concrete examples illustrating phenomena that have not been considered by current focus theories, for example not-at-issue but relevant material, and answers that do not directly address the current QUD. We will try to characterize these data and show how they can be handled in our annotation schemata.
January 23 ⇒ SFB Division of Labor Workshop
January 30. Cornelia Kiefer & Ulrike Pado (Hochschule für Technik Stuttgart):

Self assessment with question answer pairs automatically generated from an ontology

Abstact: We want to develop a system for self-assessment based on automatically generated free text questions. The idea is to (semi-)automatically extract an ontology from the lecture slides and other texts on the seminar topic. The ontology will be an interesting basis for automatically building question and answer pairs. Question generation as suggested here will have a different focus than current state-of-the-art question generation systems (see (Rus u. a., 2011) for The First Shared Task Evaluation Challenge on Question Generation 2010) which create questions that have their answer in one specific sentence or paragraph in the texts. The questions generated from individual sentences or small paragraphs focus on details. Based on an ontology one might construct broader questions which can only be answered when the broad relations between learning concepts are known. The advantage of basing question generation on an ontology compared to question generation from single sentences or paragraphs in the lecture material is that these questions will be very difficult to answer by simply learning all the text on the lecture slides by heart. The usefulness of ontology-based questions can be tested in a self-assessment system. The automatic assessment of free text student answers in the self assessment tests may be based on our previous work (Kiefer u. Pado, 2015), where we developed a system that assists professors in correcting short student answers to free text questions. The system is based on meaning comparison and can sort student answers by their similarity to reference answers provided by the professor.
References:
- Kiefer, Cornelia; Pado, Ulrike (2015) Freitextaufgaben in Online-Tests? Bewertung und Bewertungsunterstützung. In: HMD Praxis der Wirtschaftsinformatik (2015), 1-12.
- Rus, Vasile; Piwek, Paul; Stoyanchev, Svetlana; Wyse, Brendan; Lintean, Mihai; Moldovan, Cristian (2011) Question Generation Shared Task and Evaluation Challenge: Status Report. In: Proceedings of the 13th European Workshop on Natural Language Generation. Stroudsburg, PA, USA (ENLG ’11), 318–320.
February 3, 12:45st-14:00, LEAD seminar room, Gartenstraße 29a. (note unusual slot and location!).
Sarah Grey (Department of Psychology, Center for Language Science, Pennsylvania State University)

Neurocognition of foreign language learning in different instructional contexts: An ERP study of bilingual L3A

Abstract: There are more bi- and multilinguals in the world today than monolinguals (Marian & Schook, 2012) and, by extension, many of our foreign language learners arrive to the language classroom already knowing two or more languages. Despite this, the bulk of our scientific knowledge about adult foreign language learning comes from research on monolingual learners. Psycholinguistic and cognitive science research often demonstrates key differences in language processing and cognition between bi-(or multilinguals) and monolinguals (see Bialystok, Craik & Luk 2012; Bialystok, 2013). Additionally, studies in second language acquisition (SLA) often show differences in learning outcomes as a function of the explicitness (grammar information) of foreign language instruction.
The current study aimed to address critical gaps in our knowledge of adult foreign language learning. The study examined adult bilinguals’ learning of a foreign language under two different instructional contexts, Instructed (with grammar rules) and Uninstructed (no grammar rules, meaningful examples only), at different points in the learning trajectory (low and high experience), and for different language structures (word order and gender agreement). We gathered both behavioral and electrophysiological (event-related potentials, ERPs) data and compare our findings in the bilinguals to similar work that has been on monolinguals to elucidate how behavioral and neural outcomes differ between bilingual and monolingual foreign language learners.
February 6. Kay Berkling & Rémi Lavalley (Duale Hochschule Baden-Württemberg):

Empirische Analyse von Mustern in der Schriftsprache

Abstract: Mit jedem neu eingehenden Jahrgang weisen Studierende größere Defizite in der Mathematik auf, die über Eingangsprüfungen quantifiziert werden und eine wachsende Zahl von Präparationskursen erfordern. Lese- und Schreibschwächen werden hingegen nicht getestet, jedoch fallen Dozenten auch hier vermehrt Schwächen auf. Dieser Vortrag beleuchtet Ansätze für die Quantifizierung dieser Defizite und deren Ursprung. Auf der Suche nach einer geeigneten Progression zur Definition und Quantifizierung der auftretenden Fehler wird gezeigt, wie die Phonics Methode aus dem englischsprachigen Raum ins Deutsche übertragen werden kann. Eine Analyse von Kindertexten deutet darauf hin, dass bei einigen Fehlertypen über die Schulzeit hinweg keine wesentlichen Verbesserungen erkennbar sind. Die Phonics Kategorien lässen jedoch eine differenziertere Datenanalyse zu. Ein Blick auf die Lehrmaterialien und Texte lässt darauf schließen, dass die Schule für einige Kategorien keine strukturierte Methode zur Generalisierung des Erlernten zur Verfügung stellt und sich auf der anderen Seite diese Regeln auchnicht intuitiv für die Lernenden über die Zeit hinweg erschließen lassen.
upcoming
- Ramon Ziai (SFB 833): An Update on Short Answer Assessment in Context: The Role of Answer Typing and Information Structure
- June 19: Shuly Wintner (University of Haifa)

_________________________________________________________________________________

Last updated: February 6, 2015