Teraman: A Tool for Word N-gram Extraction

Inserted by:	Ing. Zdeněk Češka, Ph.D.
Date last modified:	29.12.2013
Year of insertion	2007
Size:	413 kB
Number of downloads:	14
Abbreviation:	teraman

Product description

Teraman is a tool for word N-gram extraction from large text datasets. Our approach is based on batch processing and therefore it is able to process text documents that are much larger than the available memory. The process composes of three steps: text pre-processing & indexing, counting N-grams and de-indexing. The tool is developed in C# under the .NET Framework 2.0 which is required for running. More details about Teraman are available in our paper "Teraman: A Tool for N-gram Extraction from Large Datasets", published at the IEEE ICCP 2007 international conference.

Product files

#	Title	Description	Size
1.	TMRG_Teraman.zip		406 kB

Product detail

Teraman: A Tool for Word N-gram Extraction

Product description

Download

Product files