ISCL Hauptseminar (Summer semester 2023)
Analyzing Language Development
Abstract:
Going beyond tests designed to assess language abilities, how can we characterize the language proficiency of a first or second language learner and their development? What can the computational linguistic analysis of their language production reveal? In addition to fostering our understanding of language development and the practical goal of ecologically valid proficiency assessment, such analyses are also of immediate relevance for any approach designed to adaptively foster learning.
In this seminar we’ll consider a range of approaches originating in different fields and using different methods. On the one hand, there is research on first language acquisition computing quantitative metrics such as the Mean Length of Utterances. Other approaches such as (Revised) Developmental Level (D-Level, Lu 2009; Voss 2005) or Developmental Scoring (DSS), or the Index of Productive Syntax (IPSYN, Sagae et al. 2005; Lubetich & Sagae 2014) identify the use and frequency of particular linguistic structures. On the other hand, second language acquisition (SLA) is systematically characterized in terms of Complexity, Accuracy and Fluency - with a broad range of complexity measures at all levels of linguistic modeling being identifiable by computational linguistic methods. Other SLA approaches define specific developmental sequences and rely on those to support the interpretation of relatively few observations about spontaneous language production in terms of a “Rapid Profile” of proficiency. In a related but more descriptive approach, the English Grammar Profile approach identifies a broad range of criterial features capturing the emergence of language forms and usage at different levels of proficiency. Depending on the interest of participants, we will also consider characteristics of the spoken language in addition to the analyses based on written language.
Instructor: Detmar Meurers
Office: Room 155, Keplerstraße 2
Email: detmar.meurers@uni-tuebingen.de
Office hours: Fridays, 9–10 (please arrange slot by email beforehand)
Course meets: 4 SWS
Tue & Thu 8:30 – 10:00 (Wilhelmstr. 19, Room 0.01)
(first session 25.4.22)
For people who for reasons beyond their control cannot be in class, the zoom room is: https://zoom.us/j/96719107835
Credit Points:
Core Computational Linguistics Hauptseminar 6 CP, with term paper 9 CP.
Syllabus (this file):
html-Version (http://purl.org/dm/23/ss/hs)
pdf-Version (http://purl.org/dm/23/ss/hs/syllabus.pdf)
Moodle page: https://moodle.zdv.uni-tuebingen.de/course/view.php?id=3303
Please enroll in this course by logging into this Moodle course.
Nature of course and our expectations: This is a research-oriented Hauptseminar, in which we jointly explore the topic. Everyone is expected to
regularly and actively participate in class, read the assigned papers and post a meaningful question on Moodle to the “Discussion Forum” on each reading at the latest on the day before the topic is discussed in class.
explore and present a topic (individually or as part of a group)
thoroughly research the topic, mainly based on the mentioned reference
prepare the presentation with slides, send them to the instructor by email to discusses them by email a week before the presentation
start a new Moodle thread on the “Discussion Forum” specifying what every course participant should read to prepare for your presentation a week before your presentation
present and discuss the topic in class
if you pursue the 9 CP option, work out a project term paper
in the last week of the semester, select a topic and submit a one-page abstract (e.g., spelling out the analysis goal, data set, features and approach to be used).
Note for Computational Linguistics students: The term paper must be produced in LaTeX using the ACL conference format or the Computational Linguistics journal format; BibTeX must be used for the bibliography.
Academic conduct and misconduct: Research is driven by discussion and free exchange of ideas, motivations, and perspectives. So you are encouraged to work in groups, discuss, and exchange ideas. At the same time, the foundation of the free exchange of ideas is that everyone is open about where they obtained which information. Concretely, this means you are expected to always make explicit when you’ve worked on something as a team – and keep in mind that being part of a team always means sharing the work.
For text you write, you always have to provide explicit references for any ideas or passages you reuse from somewhere else. Note that this includes text “found” on the web, where you should cite the url of the web site in case no more official publication is available.
Class etiquette: Please do not read or work on materials for other classes in our seminar. All portable electronic devices such as cell phones and laptops should be switched off for the entire length of the flight, oops, class.
Sessions
25. April – 16. Mai: Introduction and Overview (Detmar)
Tue 23. May: Phraseological Complexity (Celine Kimball)
Thu 25. May: Morphological Complexity (Christiana Chaidaridou)
Tue 13. June: Writing development (Yushan Li)
Thu 16. June: Writing development (Chi Kuan Lai)
Tue 20. June: Linguistic complexity and accuracy in academic language development (Annika Ott)
Thu 22. June: Discourse Aspects and Cohesion/Coherence in Langauge Development (Hiu Yan Yip)
Tue 27. June: Development of Sign Language (Svenja Schulze)
Thu 29. June: Linguistic Complexity analysis in a tutoring system (Daniel Capkan) (Michaud & McCoy 1999)
Tue 4. July: English Grammar Profile (Ayodeji Olupinla)
Thu 6. July: Criterial Features (Daniela Verratti Souto)
Tue 11. July: Longitudional Analysis of Language Development (Megan Horikawa)
Thu 13. July: Longitudional Analysis of Language Development (Annica Skupch)
Tue 18. July: Linguistic complexity analysis for adaptive Information Retrieval (Aron Winkler) (Chen & Meurers 2019; Chen et al. 2022)
Thu 20. July: Proficiency classification (Diana-Constantina Höfels)
Tue 25. July (finals week)
Thu 27. July (finals week)
Topics (first sketch: this will develop as the semester proceeds)
First Language Development
IPSyn (Sagae et al. 2005; Lubetich & Sagae 2014)
Linguistic complexity and accuracy in academic language development (Weiss & Meurers 2019)
Traceback Method (Kol et al. 2014; Hartmann et al. 2021)
Complexity in Second Language Acquisition research (background: Skehan 1989; Wolfe-Quintero et al. 1998; Ortega 2003; Housen & Kuiken 2009)
Lexical complexity (Laufer & Nation 1995) (more: Malvern et al. 2004; Read & Nation 2004; McCarthy & Jarvis 2010; Lu 2012; Kyle & Crossley 2015; Yoon et al. 2012)
Morphological complexity (Brezina & Pallotti 2019)
Phraseological complexity (Paquot 2019; O’Donnell & Römer 2009)
Syntactic complexity (Covington et al. 2006; Lu 2010)
Discourse/Cohesion
Longitudinal analysis (Vyatkina 2012; Vyatkina et al. 2015)
Task effects (Alexopoulou et al. 2017)
Writing development (Lu 2011; Ai & Lu 2013; Lu & Ai 2015; Crossley & McNamara 2014; Kerz et al. 2020), Academic L2 writing development (Sladoljev-Agejev & Šnajder 2017)
Grießhaber Profilanalyse (Ehl et al. 2018; Grießhaber 2013, 2019)
Criterial Features (Hawkins & Buttery 2010; Hawkins & Filipovic 2012; Tono 2013; Wisniewski 2017)
English Grammar Profile (Harrison 2015), Profile Deutsch (Glaboniat et al. 2002)
Rapid Profile (Mackey et al. 1991; Keßler 2007; Alshahrani 2008)
Miscellaneous proficiency classification (Arhiliuc et al. 2020; Kerz et al. 2021; Pilán et al. 2016; Horbach et al. 2015; Gotsoulia & Dendrinos 2011)
In a tutoring system context (Michaud & McCoy 1999)
Estonian (Vajjala & Lõo 2013, 2014), Swedish (Volodina et al. 2016)
Language testing from a psychometric perspective (Farhady 1982; Powers et al. 1985; Wolf et al. 2008; Zhang 2010; Carroll & Bailey 2016)
Ai, H. & X. Lu (2013). A corpus-based comparison of syntactic complexity in NNS and NS university students’ writing. In A. Díaz-Negrillo, N. Ballier & P. Thompson (eds.), Automatic Treatment and Analysis of Learner Corpus Data, John Benjamins, pp. 249–264.
Alexopoulou, T., M. Michel, A. Murakami & D. Meurers (2017). Task Effects on Linguistic Complexity and Accuracy: A Large-Scale Learner Corpus Analysis Employing Natural Language Processing Techniques. Language Learning 67, 181–209. URL https://doi.org/10.1111/lang.12232.
Alshahrani, A. (2008). RAPID PROFILE AS AN ALTERNATIVE ESL PLACEMENT TEST. Annual Review of Education, Communication & Language Sciences 5.
Arhiliuc, C., J. Mitrović & M. Granitzer (2020). Language proficiency scoring. In Proceedings of The 12th Language Resources and Evaluation Conference. pp. 5624–5630.
Brezina, V. & G. Pallotti (2019). Morphological complexity in written L2 texts. Second language research 35(1), 99–119.
Carroll, P. E. & A. L. Bailey (2016). Do decision rules matter? A descriptive study of English language proficiency assessment classifications for English-language learners and native English speakers in fifth grade. Language Testing 33(1), 23–52.
Chen, X. & D. Meurers (2019). Linking text readability and learner proficiency using linguistic complexity feature vector distance. Computer-Assisted Language Learning 32(4), 418–447. https://doi.org/10.1080/09588221.2018.1527358.
Chen, X., D. Meurers & P. Rebuschat (2022). ICALL offering individually adaptive input: Effects of complex input on L2 development. Language Learning & Technology 26(1). URL https://hdl.handle.net/10125/73496.
Covington, M. A., C. He, C. Brown, L. Naçi & J. Brown (2006). How complex is that sentence? A proposed revision of the Rosenberg and Abbeduto D-Level Scale. Computer Analysis of Speech for Psychological Research (CASPR) Research Report 2006-01, The University of Georgia, Artificial Intelligence Center, Athens, GA. URL http://www.ai.uga.edu/caspr/2006-01-Covington.pdf.
Crossley, S. A. & D. S. McNamara (2014). Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners. Journal of Second Language Writing 26, 66–79.
Díaz Negrillo, A. (2007). A Fine-Grained Error Tagger for Learner Corpora. Ph.D. thesis, University of Jaén, Spain.
Ehl, B., M. Paul, G. Bruns, E. Fleischhauer, M. Vock, A. Gronostaj & M. Grosche (2018). Testgütekriterien der “Profilanalyse nach Grießhaber”. Evaluation eines Verfahrens zur Erfassung grammatischer Fähigkeiten von ein-und mehrsprachigen Grundschulkindern. Zeitschrift für Erziehungswissenschaft 21(6), 1261–1281.
Farhady, H. (1982). Measures of language proficiency from the learner’s perspective. TESOL quarterly 16(1), 43–59.
Glaboniat, M., M. Müller, P. Rusch, H. Schmitz & L. Wertenschlag (2002). Profile deutsch, vol. 21. Langenscheidt Berlin.
Gotsoulia, V. & B. Dendrinos (2011). Towards a corpus-based approach to modelling language production of foreign language learners in communicative contexts. In Proceedings of the International Conference Recent Advances in Natural Language Processing 2011. pp. 557–561.
Grießhaber, W. (2013). Die Profilanalyse für Deutsch als Diagnoseinstrument zur Sprachförderung. Überblick. Kompetenzzentrum ProDaZ (Online: https://www. unidue. de/imperia/md/content/prodaz/griesshaber_profilanalyse_deutsch. pdf, Zugriff: 06.03. 2017) .
Grießhaber, W. (2019). 22. Profilanalysen. In Sprachdiagnostik Deutsch als Zweitsprache, De Gruyter Mouton, pp. 547–568.
Harrison, J. (2015). The English grammar profile. English Profile in Practice, English Profile Studies 5, 28–48.
Hartmann, S., N. Koch & A. E. Quick (2021). The traceback method in child language acquisition research: identifying patterns in early speech. Language and Cognition 13(2), 227–253.
Hawkins, J. & L. Filipovic (2012). Criterial Features in L2 English. Cambridge: Cambridge University Press.
Hawkins, J. A. & P. Buttery (2010). Criterial Features in Learner Corpora: Theory and Illustrations. English Profile Journal .
Horbach, A., J. Poitz & A. Palmer (2015). Using shallow syntactic features to measure influences of L1 and proficiency level in EFL writings. In Proceedings of the fourth workshop on NLP for computer-assisted language learning. pp. 21–34.
Housen, A. & F. Kuiken (2009). Complexity, Accuracy and Fluency in Second Language Acquisition. Applied Linguistics 30(4), 461–473.
Kerz, E., Y. Qiao, D. Wiechmann & M. Ströbel (2020). Becoming linguistically mature: Modeling english and german children’s writing development across school grades. In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications. pp. 65–74.
Kerz, E., D. Wiechmann, Y. Qiao, E. Tseng & M. Ströbel (2021). Automated classification of written proficiency levels on the CEFR-scale through complexity contours and RNNs. In Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications. pp. 199–209.
Keßler, J.-U. (2007). Assessing EFL-development online: A feasibility study of Rapid Profile. Second language acquisition research: Theory-construction and testing pp. 119–144.
Kol, S., B. Nir & S. Wintner (2014). Computational evaluation of the Traceback Method. Journal of Child Language 41(1), 176–199.
Kyle, K. (2016). Measuring Syntactic Development in L2 Writing: Fine Grained Indices of Syntactic Complexity and Usage-Based Indices of Syntactic Sophistication. Ph.D. thesis, Georgia State University. URL http://scholarworks.gsu.edu/alesl_diss/35.
Kyle, K. & S. A. Crossley (2015). Automatically Assessing Lexical Sophistication: Indices, Tools, Findings, and Application. TESOL Quarterly 49(4), 757–786.
Laufer, B. & P. Nation (1995). Vocabulary Size and Use: Lexical Richness in L2 Written Production. Applied Linguistics 16(3), 307–322. URL http://applij.oxfordjournals.org/content/16/3/307.abstract.
Lu, X. (2009). Automatic measurement of syntactic complexity in child language acquisition. International Journal of Corpus Linguistics 14(1), 3–28.
Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics 15(4), 474–496.
Lu, X. (2011). A Corpus-Based Evaluation of Syntactic Complexity Measures as Indices of College-Level ESL Writers’ Language Development. TESOL Quarterly 45(1), 36–62.
Lu, X. (2012). The Relationship of Lexical Richness to the Quality of ESL Learners’ Oral Narratives. The Modern Languages Journal pp. 190–208.
Lu, X. & H. Ai (2015). Syntactic complexity in college-level English writing: Differences among writers with diverse L1 backgrounds. Journal of Second Language Writing 29, 16–27.
Lubetich, S. & K. Sagae (2014). Data-driven Measurement of Child Language Development with Simple Syntactic Templates. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Dublin, Ireland: Dublin City University and Association for Computational Linguistics, pp. 2151–2160. URL http://aclweb.org/anthology/C14-1203.
Mackey, A., M. Pienemann & I. Thornton (1991). Rapid Profile: A second language screening procedure. Working Papers of the National Languages Institute of Australia 1(1), 61–82.
Malvern, D. D., R. B. J., C. N. & D. P. (2004). Lexical diversity and language development: Quantification and assessment. Palgrave Macmillan.
McCarthy, P. M. & S. Jarvis (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods 42(2), 381–392.
Michaud, L. N. & K. F. McCoy (1999). Modeling User Language Prociency in a Writing Tutor for Deaf Learners of English. In Proceedings of the Symposium on Computer-Mediated Language Assessment and Evaluation in Natural Language Processing, an ACL-IALL Workshop. University of Maryland, College Park, Maryland, pp. 47–54.
O’Donnell, M. B. & U. Römer (2009). Proficiency development and the phraseology of learner language. Paper Presented at the 30th ICAME Conference 2009. Lancaster, UK. 27–31 May 2009.
Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics 24(4), 492–518.
Paquot, M. (2019). The phraseological dimension in interlanguage complexity research. Second Language Research 35(1), 121–145.
Pilán, I., E. Volodina & T. Zesch (2016). Predicting proficiency levels in learner writings by transferring a linguistic complexity model from expert-written coursebooks. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. pp. 2101–2111.
Powers, S., D. M. Johnson, H. B. Slaughter, C. Crowder & P. B. Jones (1985). Reliability and validity of the language proficiency measure. Educational and psychological measurement 45(4), 959–963.
Read, J. & P. Nation (2004). Measurement of formulaic sequences. Formulaic sequences: Acquisition, processing and use pp. 23–35.
Sagae, K., A. Lavie & B. MacWhinney (2005). Automatic measurement of syntactic development in child language. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL-05). Ann Arbor, MI.
Skehan, P. (1989). Individual Differences in Second Language Learning. Edward Arnold.
Sladoljev-Agejev, T. & J. Šnajder (2017). Using analytic scoring rubrics in the automatic assessment of college-level summary writing tasks in l2. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers). pp. 181–186.
Tono, Y. (2013). Criterial feature extraction using parallel learner corpora and machine learning. In A. Díaz-Negrillo, N. Ballier & P. Thompson (eds.), Automatic Treatment and Analysis of Learner Corpus Data, John Benjamins, pp. 169–204.
Vajjala, S. & K. Lõo (2013). Role of Morpho-syntactic features in Estonian Proficiency Classification. In Proceedings of the 8th Workshop on Innovative Use of NLP for Building Educational Applications (BEA8), Association for Computational Linguistics. URL http://aclweb.org/anthology/W13-1708.pdf.
Vajjala, S. & K. Lõo (2014). Automatic CEFR level prediction for Estonian learner text. In Proceedings of the third workshop on NLP for computer-assisted language learning. pp. 113–127.
Volodina, E., I. Pilán, L. Llozhi, B. Degryse & T. François (2016). SweLLex: second language learners’ productive vocabulary. In Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition. pp. 76–84.
Voss, M. J. (2005). Determining Syntactic Complexity Using Very Shallow Parsing. Research Report 2005-01, Computer Analysis of Speech for Psychological Research (CASPR), Institute for Artificial Intelligence, The University of Georgia. URL http://www.ai.uga.edu/caspr/2005-01-Voss.pdf. Published verison of MSc thesis.
Vyatkina, N. (2012). The Development of Second Language Writing Complexity in Groups and Individuals: A Longitudinal Learner Corpus Study. The Modern Language Journal 96(4), 576–598. URL https://doi.org/10.1111/j.1540-4781.2012.01401.x.
Vyatkina, N., H. Hirschmann & F. Golcher (2015). Syntactic modification at early stages of L2 German writing development: A longitudinal learner corpus study. Journal of Second Language Writing .
Weiss, Z. & D. Meurers (2019). Analyzing Linguistic Complexity and Accuracy in Academic Language Development of German across Elementary and Secondary School. In Proceedings of the 14th Workshop on Innovative Use of NLP for Building Educational Applications (BEA). Florence, Italy: Association for Computational Linguistics.
Wisniewski, K. (2017). Empirical Learner Language and the Levels of the Common European Framework of Reference. Language Learning 67(S1), 232–253. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/lang.12223.
Wolf, M. K., T. Farnsworth & J. Herman (2008). Validity issues in assessing English language learners’ language proficiency. Educational Assessment 13(2-3), 80–107.
Wolfe-Quintero, K., S. Inagaki & H.-Y. Kim (1998). Second Language Development in Writing: Measures of Fluency, Accuracy & Complexity. Honolulu: Second Language Teaching & Curriculum Center, University of Hawaii at Manoa. URL https://doi.org/10.2307/3587656.
Yoon, S.-Y., S. Bhat & K. Zechner (2012). Vocabulary profile as a measure of vocabulary sophistication. In Proceedings of the seventh workshop on building educational applications using NLP. pp. 180–189.
Zhang, B. (2010). Assessing the accuracy and consistency of language proficiency classification under competing measurement models. Language Testing 27(1), 119–140.
Example learner corpus: NOCE (Díaz Negrillo 2007)
Short essays written by Spanish 1st and 2nd year students of English, annotated with editing and error tags
998 texts, 337.332 tokens (149.256 types)
Some text on the importance of knowing a foreign language: