About Magali Paquot

Affiliation: Centre for English Corpus Linguistics, Institut Langage et Communication, Université catholique de Louvain
Current position: FNRS postdoctoral researcher
Research interests: phraseology, collocations, lexical bundles, L1 influence, L1 identification, pedagogical lexicography, EAP/ESP, writing

International Conference for Learner Corpus Research (LCR 2021) — conference announcement

Dates: 23-25 September 2021


The LCR 2021 conference will be held at the University of Padua (Italy), at the Department of Linguistic and Literary Studies.

The international conference for Learner Corpus Research (LCR) is organised under the aegis of the Learner Corpus Association, which brings together researchers, language instructors and software developers with a common interest in the use of learner corpora for research on second language acquisition, as well as for the enhancement of language pedagogy and language assessment. The conference takes place every two years.

Theme: LCR and Language for Academic Purposes

Papers concerning any aspect of Learner Corpus Research are welcome.

Topics that are within the specific scope of LCR 2021 include, but are not necessarily limited to:

  • Language for Academic Purposes (LAP);
  • Language for Specific Purposes (LSP);
  • English as a Medium of Instruction (EMI);
  • English as a Lingua Franca (ELF);
  • Language Teaching, Assessment and Testing;
  • Corpora as pedagogical resources;
  • Multimodal learner corpora;
  • Software for learner corpus analysis;
  • Corpus-based transfer studies;
  • Data mining and other explorative approaches to learner corpora;
  • Statistical methods in learner corpus studies.


Chairs of the organising committee: Erik Castello and Katherine Ackerley (University of Padua)

Other members: Caroline Clark (University of Padua); Francesca Coccetta (Università Ca’ Foscari of Venice); Fiona Dalziel (University of Padua); Marta Guarda (University of Padua); María Belén Díez Bedmar (University of Jaén).

The link to the conference webpage and information about abstract submission will be provided later on.

Confirmed plenary speakers:

Silvia Bernardini (Università di Bologna – Italy) Ken Hyland (University of East Anglia – UK) Anke Lüdeling (Humboldt-Universität zu Berlin – Germany)


Department of Linguistic and Literary Studies, University of Padua Via Beato Pellegrino, 28

35137 – Padua (Italy)


American Association for Corpus Linguistics (AACL) 2020 – CfP

American Association for Corpus Linguistics (AACL) 2020
September 18 & 19 with workshops on September 17

Call for Papers

The 15th International American Association for Corpus Linguistics Conference (AACL2020) will take place September 18-19 2020 at Northern Arizona University.  Previous AACL conferences have been held at the University of Michigan (1999, 2005), Northern Arizona University (2000, 2006, 2014), University Massachusetts-Boston (2001), IUPUI (2002), Montclair State University (2004), Brigham Young University (2008), University of Alberta (2009), Georgia State University (2011, 2018), San Diego State University (2013), and Iowa State University (2016).  There will also be a pre-conference workshop day on September 17.

Check out the conference website:



Laurence Anthony, Waseda University

Shelley Staples, University of Arizona

Stefanie Wulff, University of Florida


Main conference general program

Call for papers

We invite contributions relating to all aspects of corpus linguistic research, application, or methods.  There are four categories of proposals (full papers, posters, panels, and pre-conference workshops). All proposals will be peer-reviewed by the conference program committee. We ask that presenters submit only two proposals as first author. The conference will feature three thematic streams in the general program. The thematic streams are as follows:

  1. Linguistic analyses of corpora as they relate to language use (e.g., register/genre variation, lexical and grammatical variation, language varieties, historical change, lexicography)
  2. Application (the use of corpora in language teaching and learning, as well as other applied fields such as testing and legal research)
  3. Tools and methods (corpus creation, corpus annotation, tagging and parsing, corpus analysis software)


Submission categories:

Full papers

Consisting of a 20-minute talk followed by 5 minutes for questions and discussion. Submissions should present completed research where substantial results have been achieved. (Work in progress should be submitted as a poster abstract.)  Abstracts should be 300 words (maximum), excluding the word count for references.


Posters can present either results of completed research or work in progress. We especially welcome poster abstracts that (a) report on innovative research that is in its early phases, or (b) report on new software or corpus data resources.  Abstracts should be 200 words (maximum), excluding the word count for references.


Panels during the main conference offer an opportunity to group related papers together to allow for extended discussions.  Proposals for panels should include the abstracts for the individual presentations (300 words max), together with an introductory abstract (200 words max) introducing the overall goals of the panel.   Panels will be allocated time slots up to a maximum of 2 hours. Please specify the desired length of time for the panel.

Pre-conference Workshops
Half-day pre-conference workshops will take place on Thursday Sept 17.  Abstracts for submission (max. 300 words) should include a complete description of the half-day workshop (max time 3 hours).


Submission guidelines:
Submit abstracts to by January 30, 2020.

Cover page: Author(s) name(s); Affiliation; Contact information; Title; Submission Category and thematic stream

Abstract page: Submission category; Title; Abstract

Format: MS Word or PDF (the latter is necessary if the abstract contains specialized fon

Book announcement: Widening the Scope of Learner Corpus Research

Dear LCA-members,

we are glad to announce that the Corpora and Language in Use’s 5th volume of proceedings entitled “Widening the Scope of Learner Corpus Research” has been recently published. It includes selected papers  from the Fourth Learner Corpus Research Conference held in Bolzano/Bozen.

You can find more information at this link: Publication

We wish you a pleasant reading.

Best regards,

Andrea & team


Andrea Abel

Eurac Research


Institutsleiterin/Direttrice d’istituto/Head of Institute

Institut für Angewandte Sprachforschung/Istituto di linguistica applicata/Institute for Applied Linguistics

T +39 0471 055 121


Drususallee/Viale Druso 1

I-39100 Bozen/Bolzano




Beyond CEFR level prediction of texts in learner corpora: Exploring feedback to learners and learning analytics

A one day workshop at Université de Paris, 30 Oct 2019 Olympe de Gouges Building, room 115, first floor (tbc)

Provisional programme

MORNING: discussing our results

9 00 opening N. Ballier The Ulysse PHC project : aims, data and limitations

9.20 Thomas Gaillat investigating learner micro-systems and customizing CEFR criterial features : the micro-system feature set and its regex syntax

9h40 discussion

10h30 Bernardo Stearns (tbc) and Annanda Sousa : the user interface prototype demo
We hope to deliver a docker and a github version of our user interface that allows you to paste a text, have a coffee while the text is processed and then get the probability of the text of being of a given CEFR level.

10h45 Discussion

11h 15 Andrew Simpkins : overfitting ? comparison with a graded corpus
As a preliminary step, we have tested our current User Interface with the CEFR ASAG corpus to check whether our model is biased to the A1 level.

11H30 General discussion

12 15 LUNCH BREAK (poster session at Diderot)
Poster displayed at Diderot and on a shared google drive for distant participants (titles tbc).
Thomas Gaillat : the Viz project for visualing metrics
Carlos Balhana (Cambridge) : Grammatical Error Correction and Interlanguage Event Representation
Vinogradova et al : a module for punctuation with the REALEC data
Vinogradova et al : the REALEC web interface : data, activities, technologies
Volodina et al. A System Architecture for Intelligent CALL examples of NLP approaches to Swedish
Nikolay Babakov : recommandation system for CEFR-indexed texts (from Russian to English ?)
O’Donnell et al. The concept approach to learner errors (incl details on data, NLP techiques used)

AFTERNOON: Learner corpora and beyond: collecting and interpreting learning process and product data

A blueprint is to be circulated pointing out potential future directions.

13h30 STRAND 1 Adding more metrics/NLP-based methods for error detection / problematic areas for learners

15h STRAND 2 Exploring the relation between Learner corpus annotation, language testing, and individual feedback to learners

16h30 coffee break

17h STRAND3 Should we try to link learner corpus and learning analytics research – and what is there to be gained? Ideas for Tracking Development path ? (Fuchs, Götz & Werner 2016) How to develop learner profiles based on student input?

1815 closing remarks and future plans

1830 end of the workshop

Call for participation

As a closing event of a European-funded project, we invite colleagues to share their ideas about the automatic analysis of learner corpora and how they can be applied towards interlanguage analysis, CEFR level prediction, and error detection – and extended to support individual feedback to learners and learning analytics.

The morning session will present some of the results of this French-Irish project “PHC Ulysse 2019”: the features of the EFCAMDAT corpus we used as the first step for our experiments, the methodology we developed, and our main findings. We will present our prototype of user interface for automatic detection of CEFR levels and discuss aspects such as overfitting of a model based on the French and Spanish components of EFCAMDAT. We will also discuss the shared task we held on a portion of this

We will discuss posters over lunch recapitulating some of the issues. Poster presenters are asked to send their A0 PDF to by Oct 15th midnight, summarizing their approach, which may include results previously presented. The afternoon functions as a round table intended to build collaborations and extensions of our project and discuss potential work packages for a follow-up project. Invited colleagues will summarize their methodologies and share their views on possible next steps.

Admission is free but registration is compulsory (on a first come, first served basis) on this webpage:

The summary of the Ulysse PHC Project can be found here :

Discussants at Diderot :

Taylor Arnold (University of Richmond, is Assistant Professor of Statistics at the University of Richmond and has a strong interest in NLP as a data scientist and digital humanist, see

Detmar Meurers (University of Tübingen, is Professor of Computational Linguistics and head of the research group on Intelligent Computer-Assisted Language Learning there:

Contact person:
Nicolas Ballier :

Applications to host LCR2021

Dear LCA members,

Following several requests for an extension of the deadline for applications to host LCR 2021, the LCA board has decided that the deadline will be extended to 5th July 2019.

I’m very much looking forward to seeing you as many of you as possible in Warsaw!

Best wishes,

Sylviane Granger

President of the LCA (on behalf of the Board)

CFP: Learner Corpus Symposium (LCSAW4) Sunday, Sep 29, 2019, Kobe, Japan

Following three successful conventions, the 4th Learner Corpus Studies in Asia and the World (LCSAW4) will be held on Sunday, 29, September 2019, at Kobe University Centennial Hall in Japan.


LCSAW4 is organized in cooperation with the ESRC-AHRC project led by Dr. Tony McEnery at Lancaster University, UK.


Invited Speakers

Tony McEnery

Patrick Rebuschatt

Padraic Monaghan

Kazuya Saito

John Williams

Aaron Baty

Pascual Pérez-Paredes

Yukio Tono

Shin  Ishikawa

Mariko Abe

Yasutake Ishii

Emi Izumi

Masatoshi Sugiura


We now welcome your submission for the poster session.



LCSAW4 Poster Session CFP


+ Date: Sunday, September 29, 2019

+ Venue: Kobe University Centennial Hall Presentation Type: Poster

+ Language: English

+ Topic: Studies related to L2 learner corpus Publication : Online

+ proceedings with ISSN will be published.

+ Submission : Please send your abstract and short-bio by 20 May 2019 If you cannot access the site, please contact the organizer (

+ Notice of acceptance: By the end of May 2019 Full paper due: By the

+ end of August 2019

+ Fee: Free




Dr. Shin Ishikawa

Kobe University

Call for applications to host LCR2021

Following the successful LCR conferences in Louvain-la-Neuve (Belgium) in 2011, Bergen (Norway) in 2013, Cuijk (the Netherlands) in 2015, Bolzano/Bozen (Italy) in 2017, and as we are approaching LCR2019 in Warsaw (Poland) in September this year (check, the Learner Corpus Association ( opens the call to host LCR2021.

Those LCA members interested in hosting LCR2021 should contact the LCA conference officer, Dr. María Belén Díez-Bedmar ( She will provide the application form that interested parties should fill in and the conference guidelines that are to be observed in the conference organization. Filled-in application forms should be emailed back to the LCA conference officer before 01/06/2019.


Looking forward to receiving your applications,

María Belén Díez-Bedmar, LCA conference officer, on behalf of the Learner Corpus Association


Building Educational Applications 2019 Shared Task: Grammatical Error Correction

Building Educational Applications 2019 Shared Task:
Grammatical Error Correction

NEW! 25/01/2019: Training data released!



Building Educational Applications 2019 Shared Task:
Grammatical Error Correction
Florence, Italy
August 2, 2019

Call for Participation

Grammatical error correction (GEC) is the task of automatically
correcting grammatical errors in text; e.g. [I follows his advices -> I
followed his advice]. It can be used to not only help language learners
improve their writing skills, but also alert native speakers to
accidental mistakes or typos.

GEC gained significant attention in the Helping Our Own (HOO) and CoNLL
shared tasks between 2011 and 2014, but has since become more difficult
to evaluate given a lack of standardised experimental settings. In
particular, recent systems have been trained, tuned and tested on
different combinations of corpora using different metrics. One of the
aims of this shared task is hence to once again provide a platform where
different approaches can be trained and tested under the same

Another significant problem facing the field is that system performance
is still primarily benchmarked against the CoNLL-2014 test set, even
though this 5-year-old dataset only contains 50 essays on 2 different
topics written by 25 South-East Asian undergraduates in Singapore. This
means that systems have increasingly overfit to a very specific genre of
English and so do not generalise well to other domains. As a result,
this shared task introduces the Cambridge English Write & Improve (W&I)
corpus, a new error-annotated dataset that represents a much more
diverse cross-section of English language levels and domains. Write &
Improve is an online web platform that assists non-native English
students with their writing (

Participating teams will be provided with training and development data
from the W&I corpus to build their systems. Depending on the chosen
track, supplementary data may also be used. System output will be
evaluated on a blind test set using ERRANT

In addition to learner data, we will provide an annotated development
and test set extracted from the LOCNESS corpus, a collection of essays
written by native English students compiled by the Centre for English
Corpus Linguistics at the University of Louvain.

There are 3 tracks in the BEA 2019 shared task. Each track controls the
amount of annotated data that can be used in a system. We place no
restrictions on the amount of unannotated data that can be used (e.g.
for language modelling).

* Restricted
In the restricted setting, participants may only use the following
annotated datasets: FCE, Lang-8 Corpus of Learner English, NUCLE, W&I
Note that we restrict participants to the preprocessed Lang-8 Corpus
of Learner English rather than the raw, multilingual Lang-8 Learner
Corpus because participants would otherwise need to filter the raw
corpus themselves.

* Unrestricted
In the unrestricted setting, participants may use any and all
datasets, including those in the restricted setting.

* Unsupervised (or minimally supervised)
In the unsupervised setting, participants may not use any annotated
training data. Since current state-of-the-art systems rely on as much
training data as possible to reach the best performance, the goal of the
unsupervised track is to encourage research into systems that do not
rely on annotated training data. This track should be of particular
interest to researchers working with low-resource languages. Since we
also expect this to be a challenging track however, we will allow
participants to use the W&I+LOCNESS development set to develop their

In order to participate in the BEA 2019 Shared Task, teams are required
to submit their system output any time between March 25-29, 2019 at
23:59 GMT. There is no explicit registration procedure. Further details
about the submission process will be provided soon.

Important Dates
Friday, Jan 25, 2019: New training data released
Monday, March 25, 2019: New test data released
Friday, March 29, 2019: System output submission deadline
Friday, April 12, 2019: System results announced
Friday, May 3, 2019: System paper submission deadline
Friday, May 17, 2019: Review deadline
Friday, May 24, 2019: Notification of acceptance
Friday, June 7, 2019: Camera-ready submission deadline
Friday, August 2, 2019: BEA-2019 Workshop (Florence, Italy)

Christopher Bryant, University of Cambridge
Mariano Felice, University of Cambridge
Øistein Andersen, University of Cambridge
Ted Briscoe, University of Cambridge

Questions and queries about the shared task can be sent to

Further details can be found at

IJLCR4.2 out!

Tense and aspect in Second Language Acquisition and learner corpus research: Introduction to the special issue
Robert Fuchs and Valentin Werner


The progressive form and its functions in spoken learner English: Tracing the effects of an exposure-rich learning environment
Lea Meriläinen
The use of stative progressives by school-age learners of English and the importance of the variable context: Myth vs. (corpus) reality
Robert Fuchs and Valentin Werner
Progressive or not progressive?: Modeling the constructional choices of EFL and ESL writers
Paula Rautionaho and Sandra C. Deshors
Arabic learners’ acquisition of English past tense morphology: Lexical aspect and phonological saliency
Helen Zhao and Yasuhiro Shirai
Can native-speaker corpora help explain L2 acquisition of tense and aspect?: A study of the “input”
Nicole Tracy-Ventura and Jhon A. Cuesta Medina

Vocab@Leuven – Call for proposals!

It is our pleasure to invite you to the Vocab@Leuven- conference.

The symposium will be hosted by KU Leuven, Belgium and will take place from 1 – 3 July 2019. The conference will be the third Vocab@conference after the successful previous editions of Vocab@Vic in Wellington in 2013 and Vocab@Tokyo in 2016.

The call for proposals is now open!
We welcome proposals related to any topic about second/foreign language vocabulary.
The scope of the topics will range from vocabulary teaching and assessment to corpuslinguistic, psycholinguistic and neurolinguistic approaches to vocabulary.
The deadline for submitting abstracts is 15 December 2018. Detailed guidelines can be found on our website:

Please visit our website for more information on the keynote speakers and invited symposia:

Click here to visit our website!

See you at Vocab@Leuven,

On behalf of the organizing committee
Elke Peters

Questions?: Contact us at