SUMMA IDF Tables

Loads from file an IDF table previouly computed. The table will stay in memory for you to use.

Parameters of the Resource

encoding: the encoding of the table
tableLocation: the location on disk of the table. Under directory resources of summa_plugin we provide aquaint.idf a table for English, and spanish_IDFs.lst a table for Spanish. You can check the format of the tables by editing them in any text editor. The first line is the number of documents which were used to compute IDF values, the other entries contrain a word and the number of documents containing the word.

None.

Computes IDFs on the fly for a processes corpus. The table will stay in memory for you to use.

corpus: the corpus to use for creating the table.
inputAnnotationSet: the annotation set containing the tokens to compute the statistics
inputAnnotationType: the token you want the statistics for
featureName: the feature of the token for the statistics
normalised: a boolean indicating if the word should be lowercased to compute the statistics
tableLocation: where you want to store your table
createTable: a boolean indicating if the table should be dumped to disk for future use.
encoding: the encoding of the table

Your corpus should contain the expected annotations a and features. The path to the table should be valid.