Linguistics 795.10: Seminar in Linguistics (Autumn 2007)
Unbounded Dependency Constructions between Linguistic Theory and Corpus Annotation
Bob Levine and Detmar Meurers


The analysis of unbounded dependency constructions (UDCs) has been a major focus in theoretical linguistics since the beginnings of generative syntax. This attention is understandable given that extraction and other UDC phenomena run counter to the general and intuitively plausible tendency of natural language to establish local relations between a head and its dependent – which naturally raised the question under what conditions non-local relations can be established and what their properties are.

UDC phenomena also occur frequently in real-life natural language and thus have been analyzed as part of more recent, corpus-linguistic and computational-linguistic efforts annotating text with syntactic structures and categories, such as the widely used Penn Treebank containing English texts from the Wall Street Journal. Among the different uses of such linguistically annotated corpora, they have been used to train parsers, which can be used to license syntactic structures for previously unseen data. Unbounded dependency structures have received significant attention in this domain as well given that they run counter to the independence of the local trees that statistical parsers typically assume. Current computational research thus has returned to the question of what to represent locally, how to represent unbounded dependencies, and how to reconstruct the long-distance relations in parsing result that do not include a representation of UDCs as such.

In this seminar we want to bring the two strands of research and their insights together: We plan to start with an overview of unbounded dependency phenomena based on Levine and Hukari (2006). On this foundation, we then jointly explore how the different types of UDCs have been analyzed in current treebanks. We expect that this will provide both an interesting empirical challenges to the linguistic theory as well as highlight the shortcomings of the analyses provided in the corpus annotation (and how these shortcomings could be remedied using linguistic insight).

Depending on the interest of the participants, we can also include a discussion of recent computational approaches to reconstructing unbounded dependencies in syntactic structures resulting from probabilistic context-free parsers trained on the annotated corpora – addressing where this can work and where it is bound to fail, based on the linguistic insights into the phenomena and their annotation discussed.

Instructors:
Course prerequisites: An introduction to Syntax.


Course meets: Tuesdays/Thursdays, 11:30–1:18 in 132 Cunz Hall


Course website: http://purl.org/net/dm/07/autumn/795.10/


Course email: 795.10 (in our local net, i.e., @ling.osu.edu)
This reaches Bob, Detmar, and everyone enrolled in the seminar.


Anonymous feedback: If you have comments, complaints, or ideas you'd like to send me anonymously, you can use the web form at http://purl.org/net/dm/feedback/ to do so. Please send us ordinary email for anything that you'd like to receive a reply to—there really is no way for me to find out who sent us something via the anonymous feedback form!


Students with Disabilities: Students who need an accommodation based on the impact of a disability should contact me to arrange an appointment as soon as possible to discuss the course format, to anticipate needs, and to explore potential accommodations. I rely on the Office of Disability Services for assistance in verifying the need for accommodations and developing accommodation strategies. Students who have not previously contacted the Office for Disability Services are encouraged to do so (292-3307; http://www.ods.ohio-state.edu).


Academic Misconduct: To state the obvious, academic dishonesty is not allowed. Cheating on assignments will be reported to the University Committee on Academic Misconduct. The most common form of misconduct is plagiarism. Remember that any time you use the ideas or the materials of another person, you must acknowledge that you have done so in a citation. This includes material that you have found on the Web. The University provides guidelines for research at http://gateway.lib.ohio-state.edu/tutor/.


Nature of course: This is a research-oriented seminar, i.e., each participant is expected to take an active role as a researcher. More concretely, each participant is expected to
  1. actively participate in the class discussion
  2. explore and present a topic: select and discuss with us what you'll explore, thoroughly research it, present it in class using overheads/handouts.
  3. work out a research/project idea related to the topic of this seminar and write a short paper (roughly 10–15 pages) about it.

Reading material: Articles dealing with specific topics will be assigned throughout the course, often by the person taking charge of one of the subtopics.

The linguistic discussion in the first half of the course will mostly be based on the following book: In the first half of the course we'll also discuss the following two articles as background for the use of corpora for theoretical linguistics:
Schedule:
This document was translated from LATEX by HEVEA.