Multilingual Document Generation
One of our major research focus is Multilingual and Multimodal Natural Text Generation (MMNTG). All major aspects of MMNTG are addressed: discourse structure and layout planning, mode selection strategies, sentence planning, and multilingual grammar and morphological resource development. Our linguistic framework is the Meaning-Text Theory. The goal is the development of a large coverage open source MTT-based MMNT-Generator. Currently, we tackle, in cooperation with external collaborators, Catalan, English, Finnish, French, German, Polish, Portuguese, Swedish and Spanish.
Computational Lexicology / Lexicography
Within Computational Lexicology / Lexicography, we are working, first of all, on the problem of automatic recognition and classification of collocations in text corpora. For the time being, Spanish and German has been focused on. A further topic in this area is the problem of the organization of lexical resources and collocation dictionaries.
Language-oriented Machine Learning
Within this area, we study the application of various paradigms of ML (supervised, unsupervised and reinforcement learning) to the acquisition of linguistic resources.
Automatic Text Summarization and Abstracting
We develop robust multilingual single-document and multi-document summarization technology. We have a set of resources to create summarization systems adapted to different needs and domains, this is being developed in the SUMMA system. Other areas of research we are working on are abstractive text summarization and summarization evaluation.
Automatic Text Simplification
Text simplification is the process of transforming a text into an equivalent which is more understandable for a target population. Simplified texts are appropriate for many groups of readers, such as language learners, elderly persons and people with other special reading and comprehension necessities. In TALN we develop robust natural language processing technology to produce simplified versions of documents at both the syntactic and lexical levels.
Sentiment Analysis and Opinion Mining
We work on multilingual (English/Spanish) sentiment analysis using lexical resources. We focus on the use of machine learning technology over linguistic and semantic features for classification of opinionated texts. We also apply summarization techniques as filtering for opinion classification.