Linguistics 684.01: Introduction to CL I (Winter 08)
This introduction for graduates and advanced undergraduates provides
an introduction to theory-driven computational linguistics (sometimes
referred to as “symbolic CL”), focusing on syntax/parsing. The
course includes some formal background and emphasizes linking the
theoretical discussions to practical experience implementing
algorithms and small grammars, based on PROLOG
The course is part of the two course introduction to CL. The second
half, 684.02, focuses on data-intensive, statistical CL and is offered
by Chris Brew in Spring.
Instructor: Detmar Meurers
-
Office: 201a
Oxley
Hall (enter through 201 computer lab; if locked, knock loudly)
- Phone: 292-0461 (usually email works better)
- Email: dm@ling.osu.edu
- Office hours: Thursdays 2:00-3:00, or by appointment
(just email me)
Course meets: Tuesday and Thursday 3:30-5:18pm in 140
Jennings Hall (JE)
Course website:
http://purl.org/net/dm/08/winter/684.01/
The updated syllabus, assignments, slides, etc. will be posted there,
so check it regularly.
Course email: 684.01@ling.osu.edu
Mail sent to this address is forwarded to the official email addresses
(Name.Number@osu.edu) of all students enrolled in the class
and the instructor. Note that you should read email sent to your
official osu account on a daily basis—it'll also helps you avoid
high library fines!
Anonymous feedback: If you have comments, complaints, or
ideas you'd like to send me anonymously, you can use the web form at
http://purl.org/net/dm/feedback/ to do so. Please send me
ordinary email for anything that you'd like to receive a reply
to—there really is no way for me to find out who sent me something
via the anonymous feedback form!
Academic Misconduct: To state the obvious, academic
dishonesty is not allowed and will be reported to the University
Committee on Academic Misconduct.
Students with Disabilities:
Students who need an accommodation based on the impact of a disability
should contact me to arrange an appointment as soon as possible to
discuss the course format, to anticipate needs, and to explore
potential accommodations. I rely on the Office of Disability Services
for assistance in verifying the need for accommodations and developing
accommodation strategies. Students who have not previously contacted
the Office for Disability Services are encouraged to do so
(292-3307; http://www.ods.ohio-state.edu).
Successful course participation involves:
-
Regular attendance and active participation (30% of grade)
- Taking reading assignments serious and completing five/six
homework assignments, some paper and pencil, some programming in
Prolog (handed out Thursday, handed in by Tuesday's class). (50%)
- Final project, to be handed in finals week, Wednesday, March 12
(20%):
-
Part I: implementing a grammar for a short, real life text
- Part II: implementing an Earley parser in the programming
language of your choice
Course prerequisites: An understanding of the basics of
linguistic analysis, syntax, and formal foundations.
Topics:
-
In the course, you will see three recurring aspects, with
increasing complexity:
-
data structures used for linguistic signs
- formalisms for expressing grammars using these data structures
- parsing algorithms for processing with those grammars
- The specific topics we will cover are the following:
-
Finite state machines and regular languages
(handout, exercise sheet 1)
- Implementing finite state machines in Prolog
(handout, exercise sheet 2)
- Towards more complex grammar formalisms: Basic formal language
theory
(handout)
- From context free grammars to definite clause grammars
(handout, exercise sheet 3)
- What to encode in a grammar: A DCG for English
(handout, project part 1: grammar)
- How to process with a grammar: Intro to Parsing
(handout, animated slides)
- More efficient parsing strategies
(handout, animated slides, exercise sheet 4)
- Remembering sub-results: Well-formed substring tables
(handout, animated slides, exercise sheet 5)
- Remembering subcomputations: The active chart
(handout, exercise sheet 6, project part 2: parsing algorithm)
- Chart parsing with complex categories
(handout)
- And depending on interest and available time, topics selected
from:
-
More on complex data structure (from atomic symbols to first
order terms to feature structures), term and feature structure
unification, PATR-II, and chart-parsing with complex categories
- Issues in compositionally building up a semantic structure
- Typed feature structure based systems (HPSG)
After the lectures, the 4-up copies of the slides and the homework
sheets are posted on the web page in pdf format.
Reading material
-
There is a course reader, available from the course web page or directly
at
The reader is intended as a basic guideline for the material covered
in this course. It is a revised version of the module workbook for
“Techniques in Natural Language Processing 1” by Chris Mellish, Pete
Whitelock and Graeme Ritchie, 1994, Department of Artificial
Intelligence, University of Edinburgh. I would like to thank them for
permitting me to adapt their material for this course.
- General background reading material:
-
Gerald Gazdar and Chris Mellish (1989): Natural
Language Processing in Prolog. Wokingham, England et al.:
Addison-Wesley.
- Fernando Pereira and Stuart Shieber (1987): Prolog
and Natural-Language Analysis. Stanford: CSLI Publications.
- Daniel Jurafsky and James H. Martin (2000): Speech
and Language Processing. Upper Saddle River, NJ: : Prentice
Hall.
Reading assignment No. 1: Chapter 1 of Jurafsky & Martin (2000)
On-line materials
The code used in the course, as well as links to several UNIX
introductions, Prolog manuals and tutorials are available from the
course web page or directly at
This document was translated from LATEX by
HEVEA.