SUMMA

 

Overview

SUMMA is a toolkit for the development of text summarization systems.

 

 


 

SUMMA IDF Tables

Loads from file an IDF table previouly computed. The table will stay in memory for you to use.

Parameters of the Resource

  • encoding: the encoding of the table
  • tableLocation: the location on disk of the table. Under directory resources of summa_plugin we provide aquaint.idf a table for English, and spanish_IDFs.lst a table for Spanish. You can check the format of the tables by editing them in any text editor. The first line is the number of documents which were used to compute IDF values, the other entries contrain a word and the number of documents containing the word.

Restriction

None.

SUMMA Corpus IDF Table

Functionality

Computes IDFs on the fly for a processes corpus. The table will stay in memory for you to use.

Parameters of the Resource

  • corpus: the corpus to use for creating the table.
  • inputAnnotationSet: the annotation set containing the tokens to compute the statistics
  • inputAnnotationType: the token you want the statistics for
  • featureName: the feature of the token for the statistics
  • normalised: a boolean indicating if the word should be lowercased to compute the statistics
  • tableLocation: where you want to store your table
  • createTable: a boolean indicating if the table should be dumped to disk for future use.
  • encoding: the encoding of the table

Restriction

Your corpus should contain the expected annotations a and features. The path to the table should be valid.

 

 

 

 

 

Copyright 2002-2014 Universitat Pompeu Fabra