New project to build a learner corpus of Scottish Gaelic beginning at the Universities of Glasgow and Aberdeen

A project to develop an international world-leading framework for the teaching and learning of Scottish Gaelic is being launched at the Universities of Glasgow and Aberdeen. The project is part of the Soillse research network, the National Research Network for the maintenance and revitalisation of Gaelic language and culture.

The project, Comasan Labhairt ann an Gàidhlig (CLAG) / Gaelic Adult Proficiency (GAP), will ensure that Gaelic adult learners are provided with a crucial resource on par with those for other European languages, including English, Dutch, and Irish.

CLAG is linked to the Common European Framework of Reference for Languages (CEFR), and will describe proficiency scales in Gaelic from beginner to advanced level. It willalso be used by language teachers and learners alike to gauge language learning and ability in spoken Gaelic.

The framework will help to maximise the number of Gaelic learners reaching fluency by providing clear learning targets, and helping them identify areas in which their spoken Gaelic skills can be improvedIt will also be aligned with existing Scottish Qualifications Authority qualifications, and will draw on a wealth of research previously conducted for Gaelic and other European languages.

Led by Professor Roibeard Ó Maolalaigh and Nicola Carty, both from the University of Glasgow, and members of Soillse at the University of Aberdeen, Dr. Michelle Macleod and Dr. Marsaili MacLeod, the project will run for three years, supported by the Scottish Funding Council and Bòrd na Gàidhlig.

Rob Ó Maolalaigh, Professor of Gaelic at the University of Glasgow said: ‘CLAG will be the first empirically derived framework to provide an objective means of describing Gaelic spoken language skills and will provide a much-needed scientific framework upon which new effective pedagogical resources can be created.’

Michelle Macleod, Senior Lecturer in Gaelic and Soillse Co-Director at The University of Aberdeen said: ‘We are delighted to be working with colleagues in the University of Glasgow on this exciting research project and are grateful to the Scottish Funding Council and Bòrd na Gàidhlig for their support.  This new project builds on the existing knowledge-base of the Gaelic adult learner sector in the Soillse network and will have significant impact for adult learners and teachers of Gaelic.’

Bòrd na Gàidhlig said: ‘Information is currently being sought from learners on their needs to support them on their journey to Gaelic fluency, which will inform this project and future projects. Discussions with learners and tutors have indicated the need for developing such a resource which will help with forward planning of classes and allow learners to engage in self-assessment of language skills. The resource will sit within a suite of resources envisaged in the strategy currently being developed by the Bòrd and other national partners.’

(Learner) Corpora and their application in language testing and assessment

Santiago de Compostela, 22 May 2013
Pre-conference workshop – Call for papers
Convenors: Marcus Callies (Bremen) and Sandra Götz (Giessen)

Corpora and corpus linguistic tools and methods are frequently used in the study of second language (L2) learning, most notably in Learner Corpus Research (LCR). LCR has contributed significantly to the description of interlanguages and many of its findings have resulted in useful applications for foreign language teaching and learning. Learner- and native-speaker corpora have also received increasing attention in the area of language testing and assessment (LTA; Barker 2010; Taylor & Barker 2008). Practical applications of corpora in LTA can range from corpus-informed to corpus-based and corpus-driven approaches, depending on how corpus data are actually put into practice, the aims and outcomes for LTA, and the degree of involvement of the researcher in the process of data retrieval, analysis and interpretation (Barker 2010; Callies, Zaytseva & Diez-Bedmar to appear).

More recently, researchers have also turned to corpora to inform, validate, and develop the way proficiency is operationalized in the Common European Framework of Reference for Languages (CEFR; Council of Europe 2001, 2009). While the CEFR has been highly influential in language testing and assessment, the way it defines proficiency levels using “can-do-statements” has been criticized, because they are often too impressionistic. For example, a learner at the C2 level is expected to maintain “consistent grammatical control of complex language”, whereas at C1 he/she should “consistently maintain a high degree of grammatical accuracy” (Council of Europe 2001, 2009). Such global, vague and underspecified descriptions have limited practical value to distinguish between proficiency levels and also fail to give in-depth linguistic details regarding individual languages or learners’ skills in specific registers. These shortcomings have led to an increasing awareness among researchers of the need  to identify more specific linguistic
descriptors or ‘criterial features’ which can be quantified by learner data. The aim of such
corpus-based approaches is to add “grammatical and  lexical details of English to CEFR’s
functional characterisation of the different levels” (Hawkins & Filipovic 2012: 5).
While (learner) corpora have the potential to increase transparency, consistency and
comparability in the assessment of L2 proficiency, several problems and challenges may also be encountered. One major difficulty is that “proficiency level” has often been a fuzzy variable in learner corpus compilation and analysis (Carlsen 2012), because, due to practical constraints, proficiency has mostly been operationalized and assessed globally by means of external criteria, typically learner-centred methods such as learners’ institutional status. However, recent studies show that global proficiency measures based on external criteria alone are not reliable indicators of proficiency for corpus compilation (Mukherjee 2009; Callies to appear 2013), and “hidden” differences in proficiency (e.g. Pendar & Chapelle  2008) often go undetected or tend to be disregarded in learner corpus analysis (e.g. Götz 2013). Thus, the field still seems to be in need of a corpus-based description of language proficiency to account for inter-learner variability and seek homogeneity in learner corpus compilation and L2 assessment. Another issue that has been intensively debated is the appropriate basis of comparison for learner corpus data, i.e. against
what yardstick learner performance should be compared and evaluated.

The aim of this workshop is to discuss the benefits in terms of current practices and
developments, but also the challenges and possible  obstacles of using both native-speaker
reference corpora and learner corpora for testing and assessing L2 proficiency. We thus invite submissions that provide case studies exemplifying how corpora can be used for the assessment of L2 proficiency in both speaking and writing. In particular, submissions should address one of the following topics:

• corpus compilation (types of corpus data and their usefulness for testing purposes;
proficiency as a fuzzy variable in learner corpus compilation and analysis; homogeneity
vs. variability in corpus composition)
• corpus comparability (e.g. as to register/genre or task setting and conditions, i.e. testing
vs. non-testing contexts, prompt, timing, access to reference works)
• the  operationalization of (types of) proficiency in corpus approaches to testing and
• the use of corpora in  data-driven approaches to the assessment of proficiency (e.g.
using corpus data to validate or complement human rating as in studies based on errortagged learner corpora, or using corpus data (partially) independently of human rating).

Abstracts are invited for this workshop and should  be 400 to 500 words long (excluding
references). They should be submitted by e-mail to and by 1st February 2013. Notification of acceptance will be sent out in late February 2013.