-->
TSAR-2022






TSAR-2022 Shared Task on Lexical Simplification


Fully Virtual Workshop (Workshop Program)
TSAR-2022 Workshop @ EMNLP 2022

UPDATES

7th September -- Deadline for Registration- REGISTRATION CLOSED
8th September -- Release of Test set (without gold annotations). TEST SET RELEASED at https://github.com/LaSTUS-TALN-UPF/TSAR-2022-Shared-Task
8th September -- Submission Guidelines released to registered participants.
15th September -- SUBMISSION CLOSED
30th September -- Official Evaluation Results Published!
Official results (visualization): https://taln.upf.edu/pages/tsar2022-st/#results
Official results (.csv files): https://github.com/LaSTUS-TALN-UPF/TSAR-2022-Shared-Task/tree/main/results/official
Extended results (.csv files): https://github.com/LaSTUS-TALN-UPF/TSAR-2022-Shared-Task/tree/main/results/extended
30th September -- Test set release (with gold annotations)
Gold annotations (test_gold .tsv files): https://github.com/LaSTUS-TALN-UPF/TSAR-2022-Shared-Task/tree/main/datasets/test
15th October -- Submission of shared-task papers deadline
10th November -- Early registration: Early Registration deadline At least one author of each accepted paper must register for EMNLP 2022 by the early registration deadline. . Early Registration: Ends 10 November 2022, 11:59pm EDT.
ANNOUNCEMENT: -- The TSAR-2022 Workshop will be fully VIRTUAL/ONLINE

24th November -- ANNOUNCEMENT: Special Issue in the journal Frontiers in Artificial Intelligence
*** We are glad to announce the launch of a call for contributions for an Special Issue in the journal Frontiers in Artificial Intelligence on the topic of the Workshop and Shared Task: Text Simplification, Accessibility, and Readability. ***
Please check all details at https://www.frontiersin.org/research-topics/47943/text-simplification-accessibility-and-readability


24th November -- ANNOUNCEMENT: TSAR-2022 workshop Sponsorship by Frontiers
*** We are very happy to announce that Frontiers is sponsoring our TSAR 2022 workshop ***






Lexical Simplification is the process of reducing the lexical complexity of a text by replacing difficult words with easier to read (or understand) expressions while preserving the original information and meaning. Lexical Simplification (LS) aims to facilitate reading comprehension to different target readerships such as foreign language learners, native speakers with low literacy levels or various kinds of reading impairments.

This new Lexical Simplification Shared Task features three similar datasets in three different languages: English, Brazilian Portuguese, and Spanish.

Guidelines


Definition of the task (for one instance): Given a sentence containing a complex word, systems should return an ordered list of “simpler” valid substitutes for the complex word in its original context. The list of simpler words (up to a maximum of 10) returned by the system should be ordered by the confidence the system has in its prediction (best predictions first). The ordered list must not contain ties.

An instance of the task for the English language is:
sentence That prompted the military to deploy its largest warship, the BRP Gregorio del Pilar, which was recently acquired from the United States.
complex word deploy

For this instance a system may suggest the following ranked substitutes: send, move, position, redeploy, employ, situate… Systems should only produce simplifications that are good contextual fits (semantically and syntactically). Participating teams can register (details below) for three different tracks, one per language:

It is possible to participate in one, two or all three tracks. Participating teams will be allowed to submit up to 3 runs per track.


Evaluation Metrics

The evaluation metrics used in the TSAR-2022 Shared Task are the following:

Note 1: Potential@1,Precision@1 and MAP@1 will have the same value.
Note 2: The exact computation of the metrics will be provided in the official evaluation script.

Shared Task Paper Submission

Participating teams will be invited to submit system description papers (four pages with unlimited number of pages for references) which will be peer-reviewed by at least 2 reviewers (at least one member of each participating team will be required to help with the review process) and papers will be published in the TSAR-2022 Workshop proceedings. The submissions will be via SoftConf at this site https://softconf.com/emnlp2022/tsar-st/. Paper submissions must use the official EMNLP templates, which are available as an Overleaf template and also downloadable directly (Latex and Word) (see here). More details for submission will be communicated to registered teams in due time.

References

Datasets

The datasets and the evaluation script will be available in this github repository: https://github.com/LaSTUS-TALN-UPF/TSAR-2022-Shared-Task
A Description of the dataset compilation for Spanish can be found in the paper ALEXSIS: A Dataset for Lexical Simplification in Spanish (see References in Guidelines Section). The compilation of the dataset for English and Portuguese follows a similar methodology with 25 annotators.

On 20th July we will release the trial files (10/12 examples):
On 8th September we will release the test_none files (369/376 instances):
On 30th September (or sooner) we will release the results and the test_gold files (369/376 instances):

IMPORTANT DATES

  • Registration opens: July 19, 2022
  • Release of sample/trial instances with gold annotations: July 20, 2022
  • Release evaluation metrics and code, July 22, 2022
  • Registration deadline September 7, 2022
  • Test set release (without gold annotations): September 8, 2022
  • Submission of systems' output due: September 15, 2022
  • Official results announced: September 30, 2022
  • Test set release (with gold annotations): September 30, 2022
  • Submission of shared-task papers deadline: October 15, 2022
  • Shared-task papers reviews due: November 1, 2022
  • Camera-ready deadline for shared-task papers: November 10, 2022
  • Early registration deadline. November 10, 2022
  • TSAR Workshop (Fully Virtual): December 8, 2022

Submission Format


The submission format is a text file with a .tsv extension for each run. For each line it will contain the result of an instance. The line should contain the sentence and the complex word (separated by a tab character) and the set of proposed candidate names in descending ranking order per line (the first candidate will be the top-ranked one,...). Ties are not allowed. The complex word and the proposed candidates are separated by a tab character. The proposed candidates are also separated by tabs. If no candidates are predicted for an instance then the substitution fields will be empty but a TAB character will be added after the complex word. The use of the complex word in the ordered list of predicted candidates is not allowed. The use of repeated predictions in the ordered list of predicted candidates is not allowed. Predicted candidates must have the same morphological inflection of the complex word in the original sentence.

SUBMISSION FORMAT TEMPLATES DIFFERENT CASES IN SUBMISSION
  1. Submission format of the previous example with the set of predicted candidates in ranked order:
    That prompted the military to deploy its largest warship, the BRP Gregorio del Pilar, which was recently acquired from the United States. deploy launch dispatch use develop employ

  2. Submission format of the previous example with an empty set of predicted candidates:
    That prompted the military to deploy its largest warship, the BRP Gregorio del Pilar, which was recently acquired from the United States. deploy

  3. Example submission with repetition of predicted candidates. The example will be accepted by the evaluation script but the repeated candidate will not be taken into account for evaluation. (please, avoid repetition of predicted candidates):
    That prompted the military to deploy its largest warship, the BRP Gregorio del Pilar, which was recently acquired from the United States. deploy launch launch dispatch use develop employ

  4. Example of submission with the target complex word of the same instance in of predicted candidates. The example will be accepted by the evaluation script but the target complex word appearing in the set of predicted substitutions will not be evaluated (please, avoid complex words in the list of predicted candidates)
    That prompted the military to deploy its largest warship, the BRP Gregorio del Pilar, which was recently acquired from the United States. deploy launch deploy dispatch use develop employ


  5. Example of correct submission but with wrong inflected substitutes. The example will be accepted by the evaluation script but the wrong inflected substitutes won't match the gold substitutions (please provide the predicted substitutes with the correct morphological form, othewise they won't match):
    That prompted the military to deploy its largest warship, the BRP Gregorio del Pilar, which was recently acquired from the United States. deploy launches dispatch use develop employs

SUBMISSION FILES GUIDELINES
Registered participants will receive the instructions on how to submit the runs.

Registration

REGISTRATION CLOSED
If you are participating in our shared task, please register your team through this form.

Results

More detailed results of the participating systems can be found at: https://github.com/LaSTUS-TALN-UPF/TSAR-2022-Shared-Task/tree/main/results/extended
Note: Acc@1, MAP@1, Potential@1, and Precision@1 give the same results as per their definitions.
English

Spanish

Portuguese

Organizers

Sanja Štajner

NLP Researcher, Germany

Horacio Saggion

Chair in Computer Science and Artificial Intelligence and Head of the LaSTUS Lab in the TALN-DTIC, Universitat Pompeu Fabra

Marcos Zampieri

Assistant Professor at the Rochester Institute of Technology

Matthew Shardlow

Senior Lecturer at Manchester Metropolitan University

Daniel Ferrés

Post-Doctoral Research Assistant at LaSTUS Lab. at TALN-DTIC, Universitat Pompeu Fabra

Kai North

PhD student at the Rochester Institute of Technology

Kim Cheng Sheang

PhD student at LaSTUS Lab. at TALN-DTIC, Universitat Pompeu Fabra

×

CONTACT

Feel free to send us messages at: