UPF

Third Workshop on Multilingual Surface Realisation

Barcelona, December 12th, 2020

MSR'20 - SIGGEN event



    The Multilingual Surface Realisation workshop series aims to bring together researchers interested in surface-oriented Natural Language Generation problems such as word order determination, inflection, functional word determination, paraphrasing, etc., especially in a multilingual context. The 2020 edition of the workshop, MSR’20, will incorporate presentation of the results of the Surface Realisation Shared Task 2020 and of a number of technical papers on (M)SR topics. The workshop will be held at COLING'20 in Barcelona, on December 12 2020.
    For information related to the conference and workshops organisation and attendance, please refer to the COLING FAQ.

Call for papers

    Natural Language Generation (NLG) is garnering growing interest both as a stand-alone task (e.g. data-to-text and text-to-text generation) and as a component task in embedding applications (e.g., abstractive summarization, dialogue-based interaction, question answering, etc.). Since 2017, three ‘deep’ NLG shared tasks have been organised focusing on English language generation from abstract semantic representations: WebNLG, SemEval Task 9, E2E. In 2018 and 2019, we organised the first shared tasks that focused on multilingual surface realisation (SR'18 and SR'19). The paradigm shift in NLP from traditional supervised machine learning techniques to deep learning, with associated substantial improvements in quality, is beginning to have a particularly transformative effect on NLG.
    In parallel to the boost from deep learning techniques, the last few years have seen a push in the annotation of multilingual treebanks with Universal Dependencies (UD), such that resources for a number of languages are now available: Currently, with version 2.6, 163 treebanks covering 92 languages can be downloaded freely. UD Treebanks facilitate the development of applications that work potentially across all of the UD treebank languages in a uniform fashion. As has already been seen in parsing, these treebanks are also a good basis for multilingual shared tasks: a system that has been built for some of the languages may work, with some adjustments, for the other languages as well. SR’18 and SR’19 were the first steps in the direction of taking advantage of these rich resources for NLG. While the MSR workshops host the SR shared tasks and the presentation of the results of the shared task by the participating teams, but much broader scope, seeking to provide a forum also for other SR research and more general work on the role of UD structures and the linkages they afford to the generation and parsing fields.
    MSR’20 invites contributions on all topics that are related to multilingual and monolingual surface realisation in NLG, specifically including and encouraging reversible methods. We welcome all submissions that address problems of surface-oriented generation such as grammatical and/or information structure-driven word order determination, inflection, functional word determination, paraphrasing, etc. We particularly encourage the submission of papers that make a clear contribution to the progress in robust multilingual surface generation, i.e. present methods easily portable from one language to another and clearly scalable. Topics of interest include, but are not limited to:

Shared Task

    This year's shared task event uses the same training, validation and in-domain test sets as the SR’19 Shared Task, with the same data and resource restrictions (see the SR’20 webpage for details). However, this year’s shared task differs in two respects: (i) the addition of an "open" mode in each track, where there are no restrictions on the resources that can be used, and (ii) the introduction of new evaluation datasets.
    As in previous years, the goal of the shared task is to generate a well-formed sentence given the input structure, and there are two tracks with different levels of complexity:
    Full details of both tasks can be found on the SR'20 website.

Important Dates

Submissions

    We invite long papers (8 pages) and short papers (4 pages). Both long and short papers have unlimited references, and their final versions will be given one additional page (up to 9 and 5 pages, respectively, in the proceedings and unlimited pages for references).
    MSR 2020 uses a double-blind reviewing process. Papers must conform to the official COLING’20 style guidelines, be in PDF format, and be submitted via the Softconf START conference management system. The paper submission deadline for both long and short workshop papers and for SR’20 system descriptions is 8 October, 2020.
    To encourage inclusiveness and the presentation of speculative and recent work, inclusion in the conference proceedings will be made optional. The author’s preference should be indicated with the final submission.
    Multiple submissions policy: Multiple submissions are allowed, but the authors should indicate clearly whether they have submitted or plan to submit a paper with the same content to another venue. To encourage inclusiveness and the presentation of speculative and recent work, inclusion in the conference proceedings will be made optional. The author’s preference should be indicated with the final submission.
    Templates, guidelines and other policies: Please refer to the COLING website for new policies for submission, review, and citation, and official style guidelines.

Registration

    For registration information, please visit the COLING’20 registration page.

Program

    The workshop will consist of technical presentations, the presentation of the shared task results, an invited talk and a discussion session.
    We are happy to announce that Yue Zhang (Westlake University) will be an invited speaker at the workshop:

14:00 Opening
14:15


Invited Talk: Yue Zhang

AMR to text generation -- a brief review and a case study using back-parsing
15:00



Oral presentation

The Third Multilingual Surface Realisation Shared Task: Overview and Evaluation Results
Simon Mille, Anja Belz, Bernd Bohnet, Thiago Castro Ferreira, Yvette Graham, Leo Wanner
15:30Break
15:50

15:50


16:00


16:10


16:20


16:30


16:40

Short presentation and Q&A with authors

BME-TUW at SR’20: Lexical grammar induction for surface realization
Gábor Recski, Ádám Kovács, Kinga Gémes, Judit Ács and Andras Kornai

ADAPT at SR’20: How Preprocessing and Data Augmentation Help to Improve Surface Realization
Henry Elder

IMSurReal Too: IMS in the Surface Realization Shared Task 2020
Xiang Yu, Simon Tannert, Ngoc Thang Vu and Jonas Kuhn

Lexical Induction of Morphological and Orthographic Forms for Low-Resourced Languages
Taha Tobaili

NILC at SR’20: Exploring Pre-Trained Models in Surface Realisation
Marco Antonio Sobrevilla Cabezudo and Thiago Pardo

Surface Realization Using Pretrained Language Models
Farhood Farahnak, Laya Rafiee, Leila Kosseim and Thomas Fevens
16:50Break
17:00 Panel/Discussions
18:00 Closing

Proceedings

    You can download the proceedings from the ACL Anthology and see the details of the task results and participating systems:

Programme Committee

Contact

    Please send us an email at msr.organizers@gmail.com if you have any question.

Organising committee

Simon Mille TALN Pompeu Fabra University,
Barcelona, Spain
Anya Belz University of Brighton
UK
Bernd Bohnet Google Research,
London, UK
Thiago Castro Ferreira Federal University of Minas Gerais,
Brasil
Yvette Graham ADAPT Center, Dublin City University,
Ireland
Leo Wanner TALN Pompeu Fabra University and ICREA,
Barcelona, Spain

Funding

    (1) Science Foundation Ireland (sfi.ie) under the SFI Research Centres Programme co-funded under the European Regional Development Fund, grant number 13/RC/2106 (ADAPT Centre for Digital Content Technology, www.adaptcentre.ie) at Dublin City University;
    (2) the Applied Data Analytics Research & Enterprise Group, University of Brighton, UK; and
    (3) the European Commission under the H2020 via contracts to UPF, with the numbers 825079-STARTS (MindSpaces), 786731-RIA (CONNEXIONs), 779962-RIA (V4Design).
tensor

tensor tensor






Photo by Christopher Burns on Unsplash