Linguistic Modeling and its Interfaces
Oberseminar, Detmar Meurers, Winter Semester 2014/2015
This series features presentations and discussions of current issues in linguistic modeling and its interfaces. This includes linguistic modeling in computational linguistics, language acquisition research, Intelligent Computer-Assisted Language Learning, and education, as well as theoretical linguistic research with a focus on the interfaces of syntax and information structure. It is open to anyone interested in this interdisciplinary enterprise.
Language Learning Tasks and Automatic Analysis of Learner Language. Connecting FLTL and NLP in the design of ICALL materials supporting effective use in real-life instruction
Abstract: We argue that the successful integration of ICALL materials demands a design process considering both pedagogical and computational requirements as equally important. Our investigation pursues two goals. The first one is to integrate into task design insights from Second Language Acquisition and Foreign Language Teaching and Learning with insights from computational linguistic modelling. The second goal is to facilitate the integration of ICALL materials in real-world instruction settings, as opposed to research or lab-oriented instruction settings, by empowering teachers with the methodology and the technology to autonomously author such materials.
To achieve the first goal, we propose an ICALL material design process that combines basic principles of Task-Based Language Instruction and Task-Based Test Design with the specification requirements of Natural Language Processing. The relation between pedagogical and computational requirements is elucidated by exploring (i) the formal features of foreign language learning activities, (ii) the complexity and variability of learner language, and (iii) the feasibility of applying computational techniques for the automatic analysis and evaluation of learner responses.
To achieve the second goal, we propose an automatic feedback generation strategy that enables
teachers to customise the computational resources required to automatically correct ICALL
activities without the need for programming skills. This proposal is instantiated and
evaluated in real world-instruction settings involving teachers and learners in secondary
education.
Visual Enhancement of the Web (VIEW): Integration of Age of
Acquisition and Word Frequency Measures
(Internship Report)
Eduard Schaf (Computational Linguistics, Universität Tübingen):
Visual input enhancement of Russian web pages with Finite-State
Transducer Technology
(BA Thesis report)
also note: SfS Colloquium Festival in the afternoon
On the impact of different computer-mediated textual enhancement conditions on L2 learners’ acquisition of article
The talk presents a study examining the effectiveness of computer-mediated textual enhancement on second language learners’ implicit and explicit knowledge of English article use. Learners logged into a visual input enhancement system, WERTi, which provides learners with authentic news texts and modifies all the English articles in the texts through different enhancement techniques. Students in Lancaster university were assigned to five different enhancement conditions: colorizing, clicking, multiple choice, fill-in-the-blanks, and control group. The first four groups were provided with enhanced texts whereas the control group read the unenhanced texts. A pretest and a posttest were conducted immediately before and after the two-week treatment, and during this period, participants were asked to read 30 news texts through WERTi. Four tests, which were adapted from Akakura (2009), measured acquisition: elicited production test, oral production test, grammaticality judgment test and metalinguistic knowledge test. The results showed that participants in the fill-in-the-blanks group developed implicit knowledge of English article use. Furthermore, a positive effect of multiple choice enhancement conditions was found on the measures of explicit knowledge. Those findings contributed to the effect of computer-mediated FonF, and have pedagogical implications on learners’ self-autonomy learning.
The talk builds on the MA thesis research of Wenjing Li and reports on a collaboration with José
Luis Moreno, Sarah Grey, Nicole Ziegler, Detmar Meurers and Patrick Rebuschat.
Native Language Identification: Background and Current Research Update
Native Language Identification (NLI) tackles the problem of determining the native language of an
author based on a text the author has written in a second language (L2). The research has
theoretical relevance, e.g., it contributes to the understanding of the L1 transfer in
SLA, as well as practical relevance, e.g., for language learning, author profiling and
security applications. NLI started to attract attention in computational linguistics
with the work of Koppel et al. (2005). Since then, the interest has increased steadily,
leading to the First NLI Shared Task in 2013, with 29 participating teams from around
the globe (Tetreault et al., 2013). The talk provides an introduction to the field by
briefly sketching its development and presenting an overview of the current issues and
trends.
Joint Morphological and Syntactic Analysis for Richly Inflected Languages
Over the past decade a variety of graph-based and transition-based parsing approaches have been developed. Yet, none of these approaches integrated parsing with part of speech tagging. By contrast, conventional phrase structure parser have for a long time included part-of-speech tagging as an integrated component.
In this talk, I will present our latest dependency-parsing system that combines part-of-speech
tagging, morphological and syntactic analysis. Starting off with a transition-based model for
joint part-of-speech tagging and dependency parsing, we explored ways of integrating
morphological features into the model. We also investigated the use of rule-based morphological
analysers and of word clusters to tackle the sparsity of lexical features. I will present an
evaluation of the model’s performance on five morphologically rich languages. We show
consistent improvements for our joint parsing over a conventional pipeline model with
regard to both morphological and syntactic accuracy. Our model obtain performance
results that go beyond the current state of the art in dependency parsing for this set of
languages.
The Influence of Text Length on Latent Semantic Analysis
(BA Thesis report)
Abstract: This thesis examines the influence of text length on Latent Semantic Analysis (LSA),
which is a statistical technique based on using the dis- tributional properties of words. With an
LSA tool both essays and short answers were scored and the correlations with the scores of human
raters were calculated. For source dependent essays the correlations of the LSA scores with the
scores of human raters decreased as expected, when the essays were shortened. Furthermore the
question whether the LSA scores for short answers improve the accuracy of an ensemble classifier is
discussed and explored.
Analyzing focus in authentic data from an explicit task context
Abstract: Focus has been discussed extensively in the theoretic literature, but these theories have
rarely been tested on authentic data. Following substantial annotation efforts in two authentic task
based corpora, we encountered a number of issues that are problematic for current theories. In this
talk, we will present data from the CREG corpus, a collection of reading comprehension exercises,
and Stuttgart 21, a spoken language corpus. We will discuss concrete examples illustrating
phenomena that have not been considered by current focus theories, for example not-at-issue but
relevant material, and answers that do not directly address the current QUD. We will
try to characterize these data and show how they can be handled in our annotation
schemata.
Self assessment with question answer pairs automatically generated from an ontology
Abstact: We want to develop a system for self-assessment based on automatically generated free text questions. The idea is to (semi-)automatically extract an ontology from the lecture slides and other texts on the seminar topic. The ontology will be an interesting basis for automatically building question and answer pairs. Question generation as suggested here will have a different focus than current state-of-the-art question generation systems (see (Rus u. a., 2011) for The First Shared Task Evaluation Challenge on Question Generation 2010) which create questions that have their answer in one specific sentence or paragraph in the texts. The questions generated from individual sentences or small paragraphs focus on details. Based on an ontology one might construct broader questions which can only be answered when the broad relations between learning concepts are known. The advantage of basing question generation on an ontology compared to question generation from single sentences or paragraphs in the lecture material is that these questions will be very difficult to answer by simply learning all the text on the lecture slides by heart. The usefulness of ontology-based questions can be tested in a self-assessment system. The automatic assessment of free text student answers in the self assessment tests may be based on our previous work (Kiefer u. Pado, 2015), where we developed a system that assists professors in correcting short student answers to free text questions. The system is based on meaning comparison and can sort student answers by their similarity to reference answers provided by the professor.
References:
Sarah Grey (Department of Psychology, Center for Language Science, Pennsylvania State University)
Neurocognition of foreign language learning in different instructional contexts: An ERP study of bilingual L3A
Abstract: There are more bi- and multilinguals in the world today than monolinguals (Marian & Schook, 2012) and, by extension, many of our foreign language learners arrive to the language classroom already knowing two or more languages. Despite this, the bulk of our scientific knowledge about adult foreign language learning comes from research on monolingual learners. Psycholinguistic and cognitive science research often demonstrates key differences in language processing and cognition between bi-(or multilinguals) and monolinguals (see Bialystok, Craik & Luk 2012; Bialystok, 2013). Additionally, studies in second language acquisition (SLA) often show differences in learning outcomes as a function of the explicitness (grammar information) of foreign language instruction.
The current study aimed to address critical gaps in our knowledge of adult foreign language learning. The study examined adult bilinguals’ learning of a foreign language under two different instructional contexts, Instructed (with grammar rules) and Uninstructed (no grammar rules, meaningful examples only), at different points in the learning trajectory (low and high experience), and for different language structures (word order and gender agreement). We gathered both behavioral and electrophysiological (event-related potentials, ERPs) data and compare our findings in the bilinguals to similar work that has been on monolinguals to elucidate how behavioral and neural outcomes differ between bilingual and monolingual foreign language learners.
Empirische Analyse von Mustern in der Schriftsprache
Abstract: Mit jedem neu eingehenden Jahrgang weisen Studierende größere Defizite in der
Mathematik auf, die über Eingangsprüfungen quantifiziert werden und eine wachsende Zahl von
Präparationskursen erfordern. Lese- und Schreibschwächen werden hingegen nicht getestet, jedoch
fallen Dozenten auch hier vermehrt Schwächen auf. Dieser Vortrag beleuchtet Ansätze für die
Quantifizierung dieser Defizite und deren Ursprung. Auf der Suche nach einer geeigneten
Progression zur Definition und Quantifizierung der auftretenden Fehler wird gezeigt, wie die
Phonics Methode aus dem englischsprachigen Raum ins Deutsche übertragen werden kann. Eine
Analyse von Kindertexten deutet darauf hin, dass bei einigen Fehlertypen über die Schulzeit
hinweg keine wesentlichen Verbesserungen erkennbar sind. Die Phonics Kategorien lässen
jedoch eine differenziertere Datenanalyse zu. Ein Blick auf die Lehrmaterialien und
Texte lässt darauf schließen, dass die Schule für einige Kategorien keine strukturierte
Methode zur Generalisierung des Erlernten zur Verfügung stellt und sich auf der anderen
Seite diese Regeln auchnicht intuitiv für die Lernenden über die Zeit hinweg erschließen
lassen.
_________________________________________________________________________________
Last updated: February 6, 2015