Teraman: A Tool for Word N-gram Extraction

Inserted by:Ing. Zdeněk Češka, Ph.D.
Date last modified:29.12.2013
Year of insertion2007
Size:413 kB
Number of downloads:14

Product description

Teraman is a tool for word N-gram extraction from large text datasets. Our approach is based on batch processing and therefore it is able to process text documents that are much larger than the available memory. The process composes of three steps: text pre-processing & indexing, counting N-grams and de-indexing. The tool is developed in C# under the .NET Framework 2.0 which is required for running. More details about Teraman are available in our paper "Teraman: A Tool for N-gram Extraction from Large Datasets", published at the IEEE ICCP 2007 international conference.


The use of this product is governed by the following license: CC-BY-NC-S

Creative Commons Attribution-NonCommercial-ShareAlike

Product files

1.TMRG_Teraman.zip406 kB

Aktualizováno 0