Building Educational Applications 2019 Shared Task: Grammatical Error Correction

Building Educational Applications 2019 Shared Task:
Grammatical Error Correction

NEW! 25/01/2019: Training data released!

CALL FOR PARTICIPATION

================================================================================

Building Educational Applications 2019 Shared Task:
Grammatical Error Correction
Florence, Italy
August 2, 2019

https://www.cl.cam.ac.uk/research/nl/bea2019st/

================================================================================
Call for Participation
================================================================================

Grammatical error correction (GEC) is the task of automatically
correcting grammatical errors in text; e.g. [I follows his advices -> I
followed his advice]. It can be used to not only help language learners
improve their writing skills, but also alert native speakers to
accidental mistakes or typos.

GEC gained significant attention in the Helping Our Own (HOO) and CoNLL
shared tasks between 2011 and 2014, but has since become more difficult
to evaluate given a lack of standardised experimental settings. In
particular, recent systems have been trained, tuned and tested on
different combinations of corpora using different metrics. One of the
aims of this shared task is hence to once again provide a platform where
different approaches can be trained and tested under the same
conditions.

Another significant problem facing the field is that system performance
is still primarily benchmarked against the CoNLL-2014 test set, even
though this 5-year-old dataset only contains 50 essays on 2 different
topics written by 25 South-East Asian undergraduates in Singapore. This
means that systems have increasingly overfit to a very specific genre of
English and so do not generalise well to other domains. As a result,
this shared task introduces the Cambridge English Write & Improve (W&I)
corpus, a new error-annotated dataset that represents a much more
diverse cross-section of English language levels and domains. Write &
Improve is an online web platform that assists non-native English
students with their writing (https://writeandimprove.com/).

Participating teams will be provided with training and development data
from the W&I corpus to build their systems. Depending on the chosen
track, supplementary data may also be used. System output will be
evaluated on a blind test set using ERRANT
(https://github.com/chrisjbryant/errant).

In addition to learner data, we will provide an annotated development
and test set extracted from the LOCNESS corpus, a collection of essays
written by native English students compiled by the Centre for English
Corpus Linguistics at the University of Louvain.

Tracks
——
There are 3 tracks in the BEA 2019 shared task. Each track controls the
amount of annotated data that can be used in a system. We place no
restrictions on the amount of unannotated data that can be used (e.g.
for language modelling).

* Restricted
In the restricted setting, participants may only use the following
annotated datasets: FCE, Lang-8 Corpus of Learner English, NUCLE, W&I
and LOCNESS.
Note that we restrict participants to the preprocessed Lang-8 Corpus
of Learner English rather than the raw, multilingual Lang-8 Learner
Corpus because participants would otherwise need to filter the raw
corpus themselves.

* Unrestricted
In the unrestricted setting, participants may use any and all
datasets, including those in the restricted setting.

* Unsupervised (or minimally supervised)
In the unsupervised setting, participants may not use any annotated
training data. Since current state-of-the-art systems rely on as much
training data as possible to reach the best performance, the goal of the
unsupervised track is to encourage research into systems that do not
rely on annotated training data. This track should be of particular
interest to researchers working with low-resource languages. Since we
also expect this to be a challenging track however, we will allow
participants to use the W&I+LOCNESS development set to develop their
systems.

Participation
————-
In order to participate in the BEA 2019 Shared Task, teams are required
to submit their system output any time between March 25-29, 2019 at
23:59 GMT. There is no explicit registration procedure. Further details
about the submission process will be provided soon.

Important Dates
—————
Friday, Jan 25, 2019: New training data released
Monday, March 25, 2019: New test data released
Friday, March 29, 2019: System output submission deadline
Friday, April 12, 2019: System results announced
Friday, May 3, 2019: System paper submission deadline
Friday, May 17, 2019: Review deadline
Friday, May 24, 2019: Notification of acceptance
Friday, June 7, 2019: Camera-ready submission deadline
Friday, August 2, 2019: BEA-2019 Workshop (Florence, Italy)

Organisers
———-
Christopher Bryant, University of Cambridge
Mariano Felice, University of Cambridge
Øistein Andersen, University of Cambridge
Ted Briscoe, University of Cambridge

Contact
——-
Questions and queries about the shared task can be sent to
bea2019st@gmail.com.

Further details can be found at
https://www.cl.cam.ac.uk/research/nl/bea2019st/

IJLCR4.2 out!

Tense and aspect in Second Language Acquisition and learner corpus research: Introduction to the special issue
Robert Fuchs and Valentin Werner
143–163

Articles

The progressive form and its functions in spoken learner English: Tracing the effects of an exposure-rich learning environment
Lea Meriläinen
164–194
The use of stative progressives by school-age learners of English and the importance of the variable context: Myth vs. (corpus) reality
Robert Fuchs and Valentin Werner
195–224
Progressive or not progressive?: Modeling the constructional choices of EFL and ESL writers
Paula Rautionaho and Sandra C. Deshors
225–252
Arabic learners’ acquisition of English past tense morphology: Lexical aspect and phonological saliency
Helen Zhao and Yasuhiro Shirai
253–276
Can native-speaker corpora help explain L2 acquisition of tense and aspect?: A study of the “input”
Nicole Tracy-Ventura and Jhon A. Cuesta Medina
277–300

Vocab@Leuven – Call for proposals!

It is our pleasure to invite you to the Vocab@Leuven- conference.

The symposium will be hosted by KU Leuven, Belgium and will take place from 1 – 3 July 2019. The conference will be the third Vocab@conference after the successful previous editions of Vocab@Vic in Wellington in 2013 and Vocab@Tokyo in 2016.

The call for proposals is now open!
We welcome proposals related to any topic about second/foreign language vocabulary.
The scope of the topics will range from vocabulary teaching and assessment to corpuslinguistic, psycholinguistic and neurolinguistic approaches to vocabulary.
The deadline for submitting abstracts is 15 December 2018. Detailed guidelines can be found on our website: https://vocabatleuven.wordpress.com/call-for-abstracts/

Please visit our website for more information on the keynote speakers and invited symposia: https://vocabatleuven.wordpress.com/

Click here to visit our website!

See you at Vocab@Leuven,

On behalf of the organizing committee
Elke Peters

Questions?: Contact us at vocabatleuven@kuleuven.be

PhD fellowship in Corpus Linguistics and Second Language Acquisition

The Centre for English Corpus Linguistics has an opening for a PhD fellowship for a total period of four years, starting as of October 2018 (later is also a possibility).

The position is part of the UCLouvain FSR-funded research project Particle placement and genitive alternations in EFL learner spoken syntax: core probabilistic grammar and/or L1specific preferences? (Promotor: Dr. Magali Paquot). The project stems from collaborative work between the promotor, Prof. B. Szmrecsanyi (KU Leuven) and Dr. J. Grafmiller (University of Birmingham) (e.g. Paquot, Grafmiller & Szmrecsanyi (2017)).

The PhD student will investigate the extent to which English as a Foreign Language (EFL) learners share a core probabilistic grammar (cf. Bresnan, 2007) with users of first and second language varieties of English by analyzing variation in grammatical constraints on the particle placement alternation (for transitive phrasal verbs) and the genitive alternation in corpora of EFL learner spoken language. Methodologically, the candidate will build on annotation guidelines developed by Szmrecsanyi, Grafmiller and colleagues to describe the predictors that may influence speakers’ choice governing the alternations; s/he will also be expected to use a range of variationist analysis techniques.

Job description:

The research project is a joint venture between the Centre for English Corpus Linguistics (CECL) at the UCLouvain and the Quantitative Lexicology and Variational Linguistics (QLVL) group at the KU Leuven. The candidate will be affiliated to the Institut Langage et Communication (ILC, UCLouvain) and will also prepare a joint UCLouvain-KU Leuven PhD in Linguistics.

Activities that the candidate will perform include:

–        develop and implement (i) theoretical concepts in line with the focus of the research project and (ii) appropriate methodological procedures for investigating these concepts;

–        conduct corpus-based analyses of L1 and L2 writing and spoken samples;

–        interpret the results of the analyses and report on the project in conference presentations and academic publications;

–        carry out a research stay at the University of Birmingham (to work in close collaboration with Dr. J. Grafmiller too);

–        by the end of the four-year term, submit and defend a PhD dissertation based on the project.

Requirements and profile:

–    Master degree in Linguistics, Applied Linguistics, Language & Literature, Natural Language Processing or in Language Learning and Teaching;

–    excellent record of BA and MA level study;

–    excellent command of English.

–    excellent and demonstrated analytic skills;

–    ability to work with common software packages (including MS Word, Excel and PowerPoint);

–    basic knowledge of corpus-linguistic techniques is a requirement

–    knowledge of statistics and statistical software is an asset;

–   programming skills in Perl, Python or R are also an asset;

–    excellent and demonstrated self-management skills, ability and willingness to work in a team;

–    willingness to live in or near Louvain-la-Neuve and to travel abroad (for short-term research stays and to attend international academic conferences).

 

Terms of employment:

–    The contract will initially be for one year, three times renewable, with a total of four years.

–    The candidate receives a doctoral fellowship grant (starting at approx. EUR 1900 net per month) and full medical insurance.

–    The candidate will be expected to apply for a FNRS position after the first year.

–    The position requires residence in Belgium.

–  Applicants from outside the EU are responsible for obtaining the necessary visa or permits, with the assistance of UCLouvain staff department.

Application Deadline: Review of applications will begin on 20 August 2018, and continue until the position is filled

Please include with your application:

–     a cover letter in English, in which you specify why you are interested in this position and how you meet the job requirements outlined above;

–    a curriculum vitae in English;

–     a concise academic statement in English in which you outline your expectations about and plans for graduate study and career goals;

–    a copy of BA and MA diplomas and degrees;

–    a copy of your master thesis and academic publications (if applicable);

–    the names and full contact details of two academic referees.

 

Shortlisted candidates will be invited for an interview (in situ or via video conferencing) in September 2018 (or later).

 

Applications (as an email attachment) and inquiries should be addressed to:

Dr. Magali Paquot

Centre for English Corpus Linguistics

Université Catholique de Louvain

Email: magali.paquot@uclouvain.be

 

References

Bresnan, J. (2007). Is syntactic knowledge probabilistic ? Experiments with the English dative alternation. In S. Featherston and W. Sternefeld (eds). Roots: Linguistics in Search of its Evidential Base. Berlin: Mouton de Gruyter, 75-96.

Paquot, M, Grafmiller, J. & B. Szmrecsanyi (2017).Particle placement alternation in EFL learner speech vs. native and ESL spoken Englishes: core probabilistic grammar and/or L1-specific preferences? Paper presented at the 4th Learner Corpus Research Conference, 5-7 October 2017, Bolzano, Italy.

IJLCR 4.1 just out!

VOCAB@Leuven

The third Vocab@ conference will be hosted by KU Leuven from 1 to 3 July 2019.

Previous Vocab@ conferences were held at Meiji Gakuin University in Tokyo in 2016 and Victoria University Wellington in 2013.

The Vocab@Leuven conference aims to bring together researchers from different disciplines who investigate the learning, processing, teaching, and testing of second/foreign language vocabulary.

Confirmed plenary speakers:

  • Batia Laufer (University of Haifa)
  • Marc Brysbaert (Ghent University)

Organizing committee:

  • Elke Peters
  • Paul Pauwels
  • Maribel Montero Perez
  • Eva Puimège
  • Ann-Sophie Noreillie
  • Thao Duong

The program will consist of paper presentations, poster sessions, invited colloquia and invited plenary speakers.

Types of presentations:

  • individual paper (20 + 10 minutes)
  • poster

We invite abstracts for paper and poster presentations about any topic related to second/foreign language vocabulary:

Strands
  1. vocabulary teaching (classroom-based research, technology-based, formal/informal learning, …)
  2. vocabulary assessment
  3. vocabulary and the skills of reading, listening, TV viewing, writing and speaking
  4. formulaic language
  5. corpus approaches to vocabulary
  6. psycholinguistic approaches to vocabulary
  7. neurolinguistic approaches to vocabulary
  8. vocabulary for specialized use (academic, business, technical, etc.)
  9. vocabulary resources (word lists, dictionaries, …)
  10. vocabulary and genre/register

Submission deadline: December 15, 2018

Research Associate – Corpus-computational linguistics and second language acquisition, University of Cambridge

A Post-Doctoral Research Associate position is available at the University of Cambridge to work with Dr Dora Alexopoulou and Professor Ianthi Tsimpli on the Leverhulme Trust grant “Linguistic typology and learnability in second language”.

The project aims to measure the impact of linguistic distance between L1 and L2 on the learning success of learners from different linguistic backgrounds. The empirical research will consist of a comprehensive corpus analysis exploiting the EF-Cambridge Open Language Database (EFCAMDAT) and the International Corpus of Learner English (ICLE). The main responsibility of the postdoctoral researcher is to conduct the corpus investigation and work with the investigators for the analysis of data and research publications.

The ideal candidate will have strong background in corpus linguistics and proven ability to work with NLP tools for extracting grammatical features and structures and to apply statistical analyses to the data. Experience with learner corpora and second language learning research is particularly welcome. Candidates will have completed a PhD in a relevant field.

The postdoctoral researcher will join the EF Research Lab for Applied Language Learning, the Processing and Language Acquisition Group and the Language Technology Group and become part of a multidisciplinary research community within Linguistics, and, more widely, within Cambridge Language Sciences.

This position is 100% FTE and funds for this post are available for 30 months in the first instance, starting on 16th of July 2018 or as soon as possible thereafter. The successful candidate will be a member of the Linguistics Section of the Faculty of Modern and Medieval Language.

The closing date for applications is midnight (BST) on 31st May 2018. Interviews will be held on 18 June 2018 at the Faculty of Modern and Medieval Languages, University of Cambridge, subject to confirmation.

For further information please visit

http://www.jobs.cam.ac.uk/job/17168/

JOB OPENING: PHD STUDENT / RESEARCHER IN ENGLISH LINGUISTICS

We are inviting applications for a three-year 0,5 PhD student position (E 13 TV-L on the German salary scale) in English linguistics the University of Bremen to be filled by 1 July 2018 or later.

Applicants have an MA in English (Applied) Linguistics (or a teacher degree with a focus on and final thesis in English linguistics) and a strong background and interest in either (learner) corpus research, Second Language Acquisition or varieties of English. Successful applicants are expected to carry out research on a dissertation in (applied) English linguistics. The teaching load comprises 2 hours per semester.

Further information about the research unit can be found online at http://www.fb10.uni-bremen.de/anglistik/linguistik/default.aspx. For further enquiries write to the e-mail address specified below.

Non-native speakers of German should have at least a working knowledge of German. Detailed information about the German salary scale can be found here: http://www.uni-bremen.de/en/international/ways-to-the-university-of-bremen/welcome-centre-international-researchers/useful-information/financing.html

Applications deadline: 15 June 2018

Mailing address for applications (e-mail applications with letter of motivation, dissertation proposal, CV, and diplomas attached as a single PDF are preferred):

Prof. Dr. Marcus Callies
University of Bremen, Faculty of Linguistics and Literary Studies
English-Speaking Cultures
Bibliothekstraße 1, GW 2
D-28359 Bremen
callies@uni-bremen.de


Prof. Dr. Marcus Callies
Universität Bremen
FB 10 – Sprach- und Literaturwissenschaften
English-Speaking Cultures – Arbeitsbereich Anglistik/Sprachwissenschaft
Bibliothekstraße, Gebäude GW 2, Büro A 3400
28359 Bremen
Tel.: +49-421-218-68150
http://www.callies.uni-bremen.de

*LearnerCorpusAssociation* http://www.learnercorpusassociation.org/
*InternationalJournalofLearnerCorpusResearch* https://benjamins.com/#catalog/journals/ijlcr/main

Special issue of the IJLCR on Segmental, prosodic and fluency features in phonetic learner corpora

 

https://benjamins.com/#catalog/journals/ijlcr.3.2/toc

Table of Contents

Introduction
Jürgen Trouvain, Frank Zimmerer, Bernd Möbius, Mária Gósy and Anne Bonneau
105 – 117
Articles
Malte Belz, Simon Sauer, Anke Lüdeling and Christine Mooshammer
118 – 148
Mária Gósy, Dorottya Gyarmathy and András Beke
149 – 174
María Luisa García Lecumberri, Martin Cooke, Mirjam Wester, Martin Cooke and Mirjam Wester
175 – 195
Ulrike Gut
196 – 222
Sylvain Detey and Isabelle Racine
223 – 249
Oliver Niebuhr, Maria Alm, Nathalie Schümchen and Kerstin Fischer
250 – 277
Correction
Erratum Vol 3, Issue 1
278
List of reviewers
Referees for Volume 3 (2017)
279 – 280