From linguistic predicate-arguments to Linked Data and ontologies: Extracting n-ary relations

Tutorial at 13th Extended Semantic Web Conference (ESWC)



Abstract

In this tutorial, we will give a comprehensive overview and hands-on training on the use of linguistic predicate-arguments and lexical resources for extracting n-ary relations from text as a means to populate the Semantic Web. More specifically, in terms of theoretical background the tutorial will cover:

  • representative predicate-argument extraction paradigms, namely Combinatory Categorical Grammar (CCG) and Dependency Grammar (DG), elaborating on corresponding implications regarding the expressiveness and completeness of the subsequently generated linked data;
  • linguistic resources, the main focus being on FrameNet, as backbone for delineating meaning in the extracted predicates and their participating arguments;
  • presentation of state-of-the-art frame detection systems in order to illustrate and foster understanding of the strengths and weaknesses incurred by the different predicate-arguments paradigms;
  • vocabularies and models for generating Linked Data and ontologies.

Description

Recent years have witnessed an increasingly closer interaction between the Semantic Web (SW) and Natural Language Processing (NLP) communities. This has fostered, among others, the deployment of NLP tools and resources for detecting linguistic predicate-argument structures in the extraction of rich text conceptualizations that explicitly cater for n-ary relations. Such tools include so-called shallow-semantic parsers, which are capable of identifying linguistic predicates and their arguments, and semantic role labellers, which assign semantic types to predicate-argument structures. In parallel, lexical resources that provide specifications of predicates in term of their arguments have been made available. While some of these resources focus on the syntactic realization of verb senses (e.g. PropBank), others go beyond and provide an inventory of predicative meanings with a fully semantic specification of their arguments (e.g. VerbNet and FrameNet). These form convenient starting points for publishing text contents as Linked Data and for producing ontologies, without the need for a-priori schemas.

Given the saliency of n-ary relation extraction from text and the relatively recent advances, the objective of this tutorial is to familiarize participants with the tools and resources involved in the extraction of such relations from text as a means for populating the Semantic Web. Thus, it considers three intertwined key topics, namely predicate-argument extraction, semantic typing of extracted predicate-arguments, and models for generating ontologies and Linked Data representations.

More specifically, in this tutorial we will first introduce participants to two main paradigms of predicate-argument extraction, namely Combinatory Categorial Grammar (CCG) and Dependency Grammar (DG), discuss the respective pros and cons, and outline the implications that the two paradigms entail for the extracted predicate-argument structures. Examples of the latter include the handling linguistic phenomena such as lexical and functional prepositions, copulas, and so forth. Complementary to the background material, state-of-the art tools, namely Boxer and the TALN-UPF transducers that implement a CCG- and shallow semantic parsing, will be used for illustration and practical experimentation. Secondly, we will go through related lexical and linguistic resources, deployed for the semantic typing of predicate-arguments, including PropBank, VerbNet and FrameNet. Alongside, representative state-of-the-art tools will be presented including Semafor, the TALN-UPF frame semantic parser, FRED and PIKES. Special focus will be placed on elaborating, on the repercussions that their different predicate-argument extraction methodologies have with respect to their performances.

Program

  9:00 -   9:20Part I - Introduction
  9:20 - 10:30Part II - From text to semantic structures: Methods, resources and tools for deep linguistic text analysis
10:30 - 11:00Coffee break
11:00 - 12:30Part III - From linguistic representations to RDF/OWL: Porting Natural Language to Semantic Web
12:30 - 14:00Lunch break
14:00 - 14:45Part IV - Example applications of n-ary relation extraction & evaluation methods
14:45 - 15:30Hands-on session I - Getting familiar with the different tools and linguistic resources
15:30 - 16:00Coffee break
16:00 - 17:30Hands-on session II - Explore the generation of OWL representations & Discussion

Tutorial materials

The tutorial comprises a series of talks and practical sessions with mini-assignments, providing participants hands-on experience with state-of-the-art tools and methodologies.

Tutorial slides:Download PDF
Examples:Download PDF


Tutoring team

Gerard Casamayor Natural Language Processing (TALN), Pompeu Fabra University,
Barcelona, Spain
Stamatia Dasiopoulou Natural Language Processing (TALN), Pompeu Fabra University,
Barcelona, Spain
Simon Mille Natural Language Processing (TALN), Pompeu Fabra University,
Barcelona, Spain

Registration

For registration information, please visit http://2016.eswc-conferences.org/






This tutorial is supported by the European commission under the contract numbers FP7-ICT-610411 and H2020-645012-RIA.