Clippers: A computational linguistics discussion group (Meurers, 795Y, Spring 2005)


Clippers is our forum for informal discussion of all issues related to computational linguistics: from work in progress of visitors and people in the department, over presentation of new papers, to practical concerns such as hints on the use of CL related software tools.

Everyone with an interest in computational linguistics is most welcome!

To see what happened in previous quarters of Clippers, you can check out the pages of some previous quarters: Winter 05, Autumn 04, Spring 04, Winter 04, Autumn 03, Spring 03, Winter 03, Autumn 02, Spring 02, Autumn 01

When and where:Tuesdays at 1730-1848 in 340 Central Classrooms.

Important: Please be sure to subscribe to our local computational linguistics mailing list on which all Clippers sessions and talks are announced.

The plan, as usual, is to start each session with 5-10 minutes on whatever someone wants to bring up and then to continue with the following topics:

  1. Tue, 29. March.: Organization
  2. Tue, 5. April.: Laura Stoia: Intensional Perspective and Proximity Markers in Collaborative Dialog (CLS talk preview)
  3. Tue, 12. April.: Mohsen Rezayat, Jim Hurt, and Lloyd Fields (DigiLore, Inc.): MATE: A Natural-Language Based Decision-Support and Mentoring System

    Abstract:As a practical matter, information by itself is not sufficient for accomplishing tasks. Information is just data placed into a context. When coupled with interpretation by human experts, information becomes the basis for knowledge and provides the underpinnings for making decisions and executing. Unfortunately, human experts in various industries are aging, expert knowledge and skills are being lost, and qualified replacements are increasingly difficult to find and train. An effective decision support system is needed to ensure that knowledge never leaves the organization and is always available to address unanticipated problems. Such a system must also reuse the captured knowledge to facilitate training of new staff and enable predictive analysis. Taking this concept a step further, we ask: what if the system could go beyond answering direct questions and actually provided mentoring and advice in a natural language on options and alternative approaches? What if the environment could support training and virtual tutoring for new or complex tasks identified by the expert or the advisor? Wouldn't achieving the agility objectives and lean initiatives within any organization be greatly enhanced in such an environment?

    With the help of sophisticated mathematical algorithms and a unique natural-language based user interface, DigiLore is developing MATE to be a digital equivalent to the above environment in several specific domains. Simply put, MATE is the infrastructure to create virtual environments consisting of a digitized Mentor plus an Advisor plus a Tutor plus an Expert. Such a system should help companies address knowledge attrition, training, and decision-support issues.

  4. Tue, 19. April.: Stacey Bailey: Revisiting and revising textual entailment
  5. Tue, 26. April.: Stacey Bailey: More on textual entailment
  6. Tue, 3. May.: Markus Dickinson: Too many rules in this treebank! Finding erroneous rules and why it matters
  7. Tue, 10. May.: Jihyun Park: Modeling Garden-Path Sentence Processing with a Statistical Part-of-Speech Tagger

    Abstract: The main purpose of the current study is to investigate the probabilistic modeling of human sentence processing data using a statistical part-of-speech tagger (POS tagger). In this study, Corley and Crocker's model (2000) was further tested on larger range of sentences. Processing difficulty reported in sentence processing data was predicted by a bigram POS tag-ger in two ways: direct comparison of probability of POS sequences between sentences or probability re-ranking within a sentence. Word-by-word probability comparison showed that probability drops more sharply at the disambiguating region in garden-path sentences compared to control sentences. Probability re-ranking, i.e. selection of a less favored POS sequence, also turned out to be another predictor of processing difficulty. The findings in this study suggest that a statistical POS tagger makes a much more powerful model of human sentence processing than might be expected, given the locality of the probability distribution.

  8. Tue, 17. May.: Jianguo Li: Robust Extraction of Subcategorization Data from Speech Transcripts

    Abstract: This study explores the robustness of familiar NLP components trained on text but applied to transcripts of spoken language. In experiment 1, we showed that for the task of acquiring subcategorization frames (SCFs) the difference in text type (written vs. spoken) does not affect the accuracy level very much. We built a system for automatically extracting SCFs from the spoken and written part of BNC separately. Our system achieves comparable accuracy for the spoken and written BNC. In experiment 2, we applied our SCF acquisition system to the ViC corpus (Raymond et al, 2002), which has much more disfluency and uncertainty about utterance segmentation. An analysis of incorrect SCF cues showed that utterance segmentation errors and disfluency tend to cause the parser to make systematic errors, which are reflected in the SCF cues. We developed linguistic heuristics for automatically detecting incorrect SCF cues. We showed that our linguistics heuristics could be applied to different spoken corpora parsed with different statistical parsers.

  9. Tue, 24. May.: (previews of second year papers)
  10. Tue, 31. May.: Mike Daniels: Generalized ID/LP Grammar: A Formalism for Parsing Linearization-based HPSG Grammars (freshly completed PhD dissertation)

Last modified: Thu Apr 14 12:03:58 EDT 2005 - For questions or comments regarding this page, please contact: Detmar Meurers