Linguistics 795.10: Seminar in Linguistics (Autumn 2007)
Unbounded Dependency Constructions
between Linguistic Theory and Corpus Annotation
Bob Levine and Detmar Meurers
The analysis of unbounded dependency constructions (UDCs) has been a
major focus in theoretical linguistics since the beginnings of
generative syntax. This attention is understandable given that
extraction and other UDC phenomena run counter to the general and
intuitively plausible tendency of natural language to establish
local relations between a head and its dependent – which naturally
raised the question under what conditions non-local relations can be
established and what their properties are.
UDC phenomena also occur frequently in real-life natural language
and thus have been analyzed as part of more recent,
corpus-linguistic and computational-linguistic efforts annotating
text with syntactic structures and categories, such as the widely
used Penn Treebank containing English texts from the Wall Street
Journal. Among the different uses of such linguistically annotated
corpora, they have been used to train parsers, which can be used to
license syntactic structures for previously unseen data. Unbounded
dependency structures have received significant attention in this
domain as well given that they run counter to the independence of
the local trees that statistical parsers typically assume. Current
computational research thus has returned to the question of what to
represent locally, how to represent unbounded dependencies, and how
to reconstruct the long-distance relations in parsing result that do
not include a representation of UDCs as such.
In this seminar we want to bring the two strands of research and
their insights together: We plan to start with an overview of
unbounded dependency phenomena based on Levine and Hukari (2006). On
this foundation, we then jointly explore how the different types of
UDCs have been analyzed in current treebanks. We expect that this
will provide both an interesting empirical challenges to the
linguistic theory as well as highlight the shortcomings of the
analyses provided in the corpus annotation (and how these
shortcomings could be remedied using linguistic insight).
Depending on the interest of the participants, we can also include a
discussion of recent computational approaches to reconstructing
unbounded dependencies in syntactic structures resulting from
probabilistic context-free parsers trained on the annotated corpora
– addressing where this can work and where it is bound to fail,
based on the linguistic insights into the phenomena and their
annotation discussed.
Instructors:
-
Bob Levine
-
Email: levine@ling.osu.edu
- Office hours: Tuesdays 9:00–11:00 (or make an
appointment by email)
- Office: 214
Oxley
Hall
- Detmar Meurers
-
Email: dm@ling.osu.edu
- Office hours: Mondays 2:00–3:00 (or make an
appointment by email)
- Office: 201A
Oxley
Hall (enter through 201 lab; if locked, knock loudly)
Course prerequisites: An introduction to Syntax.
Course meets: Tuesdays/Thursdays, 11:30–1:18 in 132 Cunz Hall
Course website:
http://purl.org/net/dm/07/autumn/795.10/
Course email: 795.10 (in our local net, i.e., @ling.osu.edu)
This reaches Bob, Detmar, and everyone enrolled in the seminar.
Anonymous feedback: If you have comments, complaints, or
ideas you'd like to send me anonymously, you can use the web form at
http://purl.org/net/dm/feedback/ to do so. Please send us
ordinary email for anything that you'd like to receive a reply
to—there really is no way for me to find out who sent us something
via the anonymous feedback form!
Students with Disabilities:
Students who need an accommodation based on the impact of a disability
should contact me to arrange an appointment as soon as possible to
discuss the course format, to anticipate needs, and to explore
potential accommodations. I rely on the Office of Disability Services
for assistance in verifying the need for accommodations and developing
accommodation strategies. Students who have not previously contacted
the Office for Disability Services are encouraged to do so
(292-3307; http://www.ods.ohio-state.edu).
Academic Misconduct: To state the obvious, academic
dishonesty is not allowed. Cheating on assignments will be reported to
the University Committee on Academic Misconduct. The most common form
of misconduct is plagiarism. Remember that any time you use the ideas
or the materials of another person, you must acknowledge that you have
done so in a citation. This includes material that you have found on
the Web. The University provides guidelines for research at
http://gateway.lib.ohio-state.edu/tutor/.
Nature of course: This is a research-oriented seminar, i.e.,
each participant is expected to take
an active role as a researcher.
More concretely, each participant is expected to
-
actively participate in the class discussion
- explore and present a topic: select and discuss with us what
you'll explore, thoroughly research it, present it in class using
overheads/handouts.
- work out a research/project idea related to the topic of this
seminar and write a short paper (roughly 10–15 pages) about it.
Reading material: Articles dealing with specific topics will
be assigned throughout the course, often by the person taking charge
of one of the subtopics.
The linguistic discussion in the first half of the course will
mostly be based on the following book:
-
Robert D. Levine and Thomas E. Hukari (2006): "The Unity of
Unbounded Dependency Constructions". CSLI Publications
Please obtain a copy before the beginnig of the course, e.g., from
your favorite local bookstore or for $37.50 (free shipping) at
Amazon.com
and
BarnesAndNoble.com.
In the first half of the course we'll also discuss the following two
articles as background for the use of corpora for theoretical
linguistics:
-
Detmar Meurers and Stefan Müller (to appear, 2007): “Corpora
and Syntax”. Article 44 in Lüdeling, A. and Kytö,
M. Corpus Linguistics. Handbooks of Linguistics and
Communication Science. Berlin: Mouton de Gruyter. Preprint
available from
http://purl.org/net/dm/papers/meurers-mueller-05.html
- Detmar Meurers (2005): “On the use of electronic corpora for
theoretical linguistics. Case studies from the syntax of
German”. Lingua Volume 115 (11). Preprint available from:
http://purl.org/net/dm/papers/meurers-03.html
Schedule:
-
Week 1 (Thursday, Sept. 20): Introduction (handout)
- Week 2 (Sept. 25./27): Phenomena and Issues (handout)
- Week 3 (Oct. 2./4.): cont.
- Week 4 (Oct. 9./11.): Corpora and Syntactic Research
- Week 5 (Oct. 16./18.): cont. (Exploring UDCs with the Penn Treebank Activity)
- Week 6
-
Oct 23.: Adjunct Extraction (Bob)
- Oct 25. cont.
- Week 7
-
Oct 30: Context and UDCs (Bob)
-
Reading: Kehler (2003): Coherence, Reference, and
the Theory of Grammar, ch. 5, and as background ch. 1 and 2
- Nov 1: Context and UDCs (Detmar, handout)
- Week 8
-
Nov 6: Head-Filler Extraction Path Marking (Chris)
- Nov 8: Preposition stranding in Old English (Salena)
- Week 9
-
Nov 13: Relative Clauses in Japanese and Korean (Jungmee, Yusuke)
- Nov 15: Psycholinguistic Processing accounts of UDCs (Lia)
- Week 10
- Week 11
-
Nov 27: Identifying UDCs in Parsing (Raja)
- Nov 29: Clitics and UDCs in French (Scott)
This document was translated from LATEX by
HEVEA.