Downloads

Teraman: A Tool for Word N-gram Extraction

Inserted byIng. Zdeněk Češka, Ph.D.
Date last modified29.12.2013
Rok zařazení2007
Size413 kB
Number of downloads3
Abbreviationteraman

Product description

Teraman is a tool for word N-gram extraction from large text datasets. Our approach is based on batch processing and therefore it is able to process text documents that are much larger than the available memory. The process composes of three steps: text pre-processing & indexing, counting N-grams and de-indexing. The tool is developed in C# under the .NET Framework 2.0 which is required for running. More details about Teraman are available in our paper "Teraman: A Tool for N-gram Extraction from Large Datasets", published at the IEEE ICCP 2007 international conference.


Download

The use of this product is governed by the following license:CC-BY-NC-S

Creative Commons Attribution-NonCommercial-ShareAlike



Product files

#TitleDescriptionSize
1.TMRG_Teraman.zip406 kB