Linguistic Modeling and its Interfaces
Oberseminar, Detmar Meurers, Winter Semester 2016

This series features presentations and discussions of current issues in linguistic modeling and its interfaces. This includes linguistic modeling in computational linguistics, language acquisition research, Intelligent Computer-Assisted Language Learning, and education, as well as theoretical linguistic research with a focus on the interfaces of syntax and information structure. It is open to anyone interested in this interdisciplinary enterprise.

When: Fr 10ct-12
Where: Room 1.13 (Blochbau, Wilhelmstr. 19)
Mailing list for related announcements:
http://mailman.sfs.uni-tuebingen.de/cgi-bin/mailman/listinfo/ling-mod-info
Moodle page (for discussion forum and materials restricted to the university of Tübingen):
https://moodle02.zdv.uni-tuebingen.de/course/view.php?id=1290

A list of talks in previous semesters can be found here:

Winter 15/16, Summer 15, Winter 14/15, Summer 14, Winter 13/14, Summer 13, Winter 12/13, Summer 12, Winter 11/12, Summer 11, Summer 10, Winter 09/10, Summer 09

Sessions

October 28. New semester kick-off
November 11. Kordula De Kuthy: Learning what the crowd can do: A case study on focus annotation (joint work with Ramon Ziai and Detmar Meurers, based on work published at LREC 2016 and LAW 2016)
Abstract While the formal pragmatic concepts in information structure, such as the focus of an utterance, are precisely defined in theoretical linguistics and potentially very useful in conceptual and practical terms, it has turned out to be difficult to reliably annotate such notions in corpus data (Ritz et al., 2008; Calhoun et al., 2010). We present a large-scale focus annotation effort designed to overcome this problem. Our annotation study is based on the tasked-based corpus CREG (Ott et al., 2012), which consists of answers to explicitly given reading comprehension questions. We compare focus annotation by trained annotators with a crowd-sourcing setup making use of untrained native speakers. Given the task context and an annotation process incrementally making the question form and answer type explicit, the trained annotators reach substantial agreement for focus annotation. Interestingly, the crowd-sourcing setup also supports high-quality annotation – for specific subtypes of data. Finally, we turn to the question whether the relevance of focus annotation can be extrinsically evaluated. We show that automatic short-answer assessment significantly improves for focus annotated data. The focus annotated CREG corpus is freely available and constitutes the largest such resource for German.
November 18. Simón Ruiz: Input enhancement meets ICALL: Theoretical and methodological implications (joint work with Nicole Ziegler, Sarah Grey, Jose Luis Moreno, Wendy Li, Maria Chinkina, Detmar Meurers, and Patrick Rebuschat to appear in the journal Language Learning)
Abstract Despite the promise of research conducted at the intersection of Computer- Assisted Language Learning (CALL), Natural Language Processing (NLP) and Second Language Acquisition (SLA), few studies have explored the potential benefits of using Intelligent Computer-Assisted Language Learning (ICALL) systems to deepen our understanding of the process and products of second language (L2) learning. The strategic use of technology offers researchers novel methodological opportunities to examine how incremental changes in L2 development occur during treatment, as well as how the longitudinal impacts of experimental interventions on L2 learning outcomes occur on a case-by-case basis. Drawing on the pilot results from a project examining the effects of automatic input enhancement on L2 learners’ development, this article explores how the use of technology offers additional methodological and analytical choices for the investigation of the process and outcomes of L2 development, illustrating the opportunities to study what learners do during visually enhanced instructional activities.
November 25. Ulrike Pado (Hochschule für Technik Stuttgart): (Automated) Test Creation
Abstract Combining fair, informative questions into a balanced exam requires skill and experience. What are the properties of a good test, and how can we ensure they are met? I will discuss the ”naive” approach to test creation and the use of probabilistic test theory and their respective strengths: The naive approach (write questions, assign points, determine absolute point threshold for passing) gives test-takers maximum transparency of expected performance, while item norming according to probabilistic test theory more easily allows the automated creation of tests from large item banks or user-adaptive testing. I will briefly present ongoing research that uses insights from both methods to better understand questions (and answers).
December 2. Xiaobin Chen: CTAP: A Web-Based Tool Supporting Automatic Complexity Analysis
Abstract Informed by research on readability and language acquisition, computational linguists have developed sophisticated tools for the analysis of linguistic complexity. While some tools are starting to become accessible on the web, there still is a disconnect between the features that can in principle be identified based on state-of-the-art computational linguistic analysis, and the analyses a second language acquisition researcher, teacher, or textbook writer can readily obtain and visualize for their own collection of texts. This short paper presents a web-based tool development that aims to meet this challenge. The Common Text Analysis Platform (CTAP) is designed to support fully configurable linguistic feature extraction for a wide range of complexity analyses. It features a user-friendly interface, modularized and reusable analysis component integration, and flexible corpus and feature management. Building on the Unstructured Information Management framework (UIMA), CTAP readily supports integration of state-of-the-art NLP and complexity feature extraction maintaining modularization and reusability. CTAP thereby aims at providing a common platform for complexity analysis, encouraging research collaboration and sharing of feature extraction components—to jointly advance the state-of-the-art in complexity analysis in a form that readily supports real-life use by ordinary users.
December 9. Verena Möller (UC Louvain): Data in Isolation vs Data in Combination – Insights into the Impact of Content and Language Integrated Learning (CLIL)
Abstract Ever since Learner Corpus Research (LCR) came into being, it has been viewed as having great potential to impact upon the related fields of Second Language Acquisition (SLA) and Foreign Language Teaching (FLT). Due to disparate research methodologies, however, the links between the three fields have proven to be somewhat weak. While SLA studies have made extensive use of experimental data and assessed the impact of a variety of learner variables, in particular of the cognitive and affective types, LCR studies have often remained monofactorial, relying on no more than a single predictor in their study of learner language and omitting cognitive and affective variables from their designs. Likewise, even though studies of FLT materials have been undertaken, results have not usually been linked to the output produced by the learners taught via the materials in question. In our study, we analyse learner language, experimental data, individual differences and teaching materials. By assessing the impact of Content and Language Integrated Learning (CLIL) on the acquisition of the English passive by German learners at the secondary level, we show that, while monofactorial approaches may yield interesting results, multifactorial designs, i.e. approaches taking into account a variety of predictors and different types of data, necessarily provide greater insight into the processes at work in language acquisition.
December 16. Andreas Lachner (IWM): Mind the gap! Automated concept map feedback scaffold students‘ expository writing
Abstract Many students are challenged with the demand of writing cohesive explanations. To support students in writing cohesive explanations, we developed CohVis, a computer-based feedback tool that visualizes cohesion deficits of students’ expository texts in a concept map. CohVis uses natural language processing technologies and state-of-the-art visualization methods to automatically provide students with formative graphical feedback about the cohesion of their texts. In this talk, I will first provide an overview about the methodologies, implemented in CohVis. Additionally, I present several studies in which we investigated the effectiveness of such feedback as well as the underlying cognitive processes.
January 13. Stefan Müller (HU Berlin): Approaches to resultatives and benefactives: Comparison of a phrasal LFG approach using glue-semantics and a lexical HPSG approach
Abstract In this talk, I will discuss specific LFG proposals for the analysis of resultative and benefactive constructions. Some authors suggested phrasal analyses, that is, analyses that assume a certain fixed phrase structure that licenses these constructions. The proposals under consideration are interesting since they use a resource sensitive semantics (glue semantics) that makes it possible to use templates that remap arguments to different grammatical functions. This makes it possible to use inheritance techniques to inherit constraints that account for passivization. So, although arguments are added in phrase structure configurations passive can be accounted for lexically and interacts appropriately with arguments that are introduced configurationally.
I will show that such approaches nevertheless have shortcomings namely the interaction with derivational morphology and the interaction of the constructions under consideration with other phenomena like passivization and extraction. I will show that such phrasal approaches do not capture cross-linguistic generalizations. I develop a lexical rule-based approach in the framework of HPSG, which is also implemented in grammars of English and German.

The semantics-based part of the original paper is rather complex and cannot be discussed in full detail during the talk. Those who are really interested in these issues may read the paper by Ashudeh et. al. 2014 to facilitate discussion during and after the talk: http://web.stanford.edu/group/cslipublications/cslipublications/LFG/19/papers/lfg14asudehetal.pdf Part of the material that will be covered in the talk is here: http://hpsg.fu-berlin.de/~stefan/Pub/phrasal-lfg-headlex2016.html
January 20. Maria Chinkina (with Simón Ruiz and Detmar Meurers): Question generation
January 25. Related talk in Cognitive Science and Psychology Colloquium Series, held in the PI, Alte Frauenklinik, Room 4.332
Chris Biemann (Univ. Hamburg): Adaptive Natural Language Processing in the Cognitive Computing Paradigm
Abstract In the past decades of NLP, there has been a steady shift away from rule-based, linguistically motivated modeling towards statistical learning and the induction of unsupervised feature representations. However, natural language components used in today’s NLP pipelines are still static in the sense that their statistical model or rule-base is created once, then subsequently applied without further change. In this talk, I will motivate an adaptive approach to natural language processing, where NLP components get smarter through usage over time, following a ’cognitive computing’ approach to natural language processing. With the help of recent research prototypes, three stages of data-driven adaptation will be illustrated: feature/resource induction, induction of processing components and continuous data-driven learning. Finally, I will discuss challenges in the evaluation of adaptive NLP components.
January 27. Attention: held in Hörsaal 036 (Brechtbau, Wilhelmstr. 50)
Barbara Geist (Univ. Leipzig): Wie Schüler über Schreibungen sprechen – Erste Ergebnisse aus dem Korpus ‘Rechtschreibgespräche’

Abstract Reziprokes Lernen, bei dem SchülerInnen sich untereinander Zusammenhänge erklären, zählt zu den Unterrichtsmaßnahmen mit besonders hohen Lerneffekten. Rechtschreibgespräche fordern die Kinder genau dazu auf und stellen sie vor die Herausforderung, Schreibungen systematisch und im kooperativen Austausch zu untersuchen und ihre Gedanken zu explizieren. Obwohl in praxisorientierten Zeitschriften zahlreiche Veröffentlichungen zur Initiierung und Gestaltung von Rechtschreibgesprächen vorliegen, gibt es bislang noch keine wissenschaftlichen Untersuchungen, die die gegenstandsbezogenen Lehr-Lern-Prozesse in diesem didaktischen Setting genauer in den Blick nehmen. In der Forschung zur Unterrichtskommunikation wurde bislang vorrangig (lehrerzentrierter) Unterricht im Klassenverband untersucht, selten Schüler-Schüler-Interaktionen in Kleingruppen (jedoch Riegler 2010). Außeracht gelassen wurde bislang in Studien zur Unterrichtskommunikation der sprachliche Hintergrund der Schüler (Deutsch als Erst- oder Zweitsprache).
In dem Projekt wird erforscht, wie Schüler mit Deutsch als Erst- und Zweitsprache in Kleingruppen und im Klassenverband über Schreibungen sprechen. Aus orthografiedidaktischer Sicht ist von Interesse, welche Schreibungen die Schüler als schwierig klassifizieren und welche Erklärungen die Schüler (gemeinsam) formulieren. Aus der Perspektive der Spracherwerbsforschung und dem Kompetenzbereich “Sprechen und Zuhören” (KMK 2004) wird analysiert, welche sprachlichen Strukturen die Schüler verwenden, um sich über den vorgegebenen fachlichen Inhalt auszutauschen. Zeigen sich Unterschiede für Parameter wie z.B. Äußerungslänge zwischen Schülern verschiedener Klassenstufen und/oder zwischen Schülern mit Deutsch als Erst- und Zweitsprache? Zukünftig ist neben qualitativen Analysen der Lehr-Lernprozesse geplant linguistische Korrelate von “Bildungssprache” zu analysieren (z.B. syntaktisch: Verbstellung, Umfang des Mittelfelds).
Das Design der Studie und Ergebnisse einer kontrastiven Fallanalyse eines Gesprächs von Schülerinnen mit Deutsch als Erst- und Zweitsprache sowie ausschließlich von Schülerinnen mit Deutsch als Zweitsprache werden vor- und zur Diskussion gestellt.
February 3. no session (Center of Excellence review)
February 10. Stefan Riezler (Universität Heidelberg): Learning Structured Prediction from Interactions
Abstract Standard supervised learning is learning from gold standard examples. This scenario is infeasible if examples have rich structure (parses, translations) and are required in large amounts. We present an approach to perform structured prediction by learning from interaction, following an online learning protocol where on each of a sequence of iterations, the learner receives an input, predicts an output structure, and receives partial feedback in form of a task loss evaluation of the predicted structure. Under this type of “bandit feedback”, the learner does not know what the correct prediction looks like, nor what would have happened if it had predicted differently. We present new learning objectives and algorithms for this interactive scenario, focusing on convergence speed and ease of elicitability of feedback, and several simulation experiments for different NLP tasks (see recent papers at ACL 2016 (http://www.cl.uni-heidelberg.de/~riezler/publications/papers/ACL2016.pdf) and NIPS 2016 (http://www.cl.uni-heidelberg.de/~riezler/publications/papers/NIPS2016.pdf)). Furthermore, we give an outlook on lifting this scenario to neural networks and how to perform offline learning for Bandit Structured Prediction.

_________________________________________________________________________________

Last updated: January 31, 2017