SUMMA

 

Overview

SUMMA is a toolkit for the development of text summarization systems.

 

 


 

SUMMA NGrams Computation

Functionality

Flexible n-gram compuation program. Creates n-grams annotations of different granularities based on specified annotations.

Parameters of the Resource

  • inputAS: the annotation set where the annotations live
  • annType: the annotation base for computing n-grams (e.g., Token)
  • feature: the feature to create the components of the n-gram (e.g. string of the Token)
  • ngram: the n for computing the n-grams. Annotations (1-gram, 2-gram,...n-gram will be computed as well as a ngramSEQ annotation)
  • sentAnn: the annotation representing the sentence (e.g. Sentence)
  • outputAS: the name of the annotation set to store the n-grams.
  • normalize: true if you want your ngrams to be lowercased

Restriction

The document should have the annotations and features needed for it to correctly work. The table of statistics that you use needs to be computed from similar annotations to those you want your statistics computed, i.e. if you want to compute "Token" statistics, then yout IDF table should be one with Token statistics in it.

 

 

 

 

 

Copyright 2002-2014 Universitat Pompeu Fabra