Dalibor Fiala, Ph.D.

Document Classification
Keywords:	classification, clustering, categorization, classifier, machine learning, spam, filter, unsolicited mail, content
Description:	Use of inductive machine learning methods in classification of short text documents.Research includes implementation of the Itemsets classifier, Naive Bayes classifier, NBCI (Naive Bayes Combined with Itemsets), and TFxIDF classifier, in addition to clustering algorithms. Application of classification algorithms based on inductive machine learning in filtering of unsolicited mail (spam).
Status:	Finished

People on this project:

Jiří Hynek

Phone: +420 603492837
E-mail: jhynek@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/staff/osobni.php?id_osoby=147&lang=EN

Jiri, a co-founder of the Text-Mining Research Group, works as a lecturer at the Dept. of Computer Science and Engineering. His research interests include machine learning and language-related problems. Jiri’s teaching activity is focused on good writing style and technical writing in general.

Karel Ježek

Phone: +420 377632475, 377632400
E-mail: jezek_ka@kiv.zcu.cz
WWW: http://www-kiv.zcu.cz/~jezek_ka/

Karel is a group coordinator and a supervisor of PhD students working at research projects of this Group.

Michal Toman

E-mail: mtoman@kiv.zcu.cz

Michal graduated at UWB in 2003, specialized in software engineering. Currently, he is a PhD student interested in information retrieval, multilingual text processing, word sense disambiguation and knowledge discovery.

Roman Tesař

Phone: +420 377632479
E-mail: roman.tesar@gmail.com
WWW: http://www.sweb.cz/romant1/CV.pdf

Roman is a PhD student at the Department of Computer Science and Engineering, Faculty of Applied Sciences, University of West Bohemia in Pilsen, Czech Republic. His work is focused on the utilization of word n-grams in text classification and document filtering.

Zdeněk Češka

E-mail: zceska@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/en/department/members/detail.html?login=zceska

Zdeněk has been working for various international companies in the field of Software Engineering. He has earned Master's Degree and PhD's Degree in the field of Computer Science and Engineering. His research interests include Mathematics & Algorithmization, Plagiarism Detection, Multilingual Processing, Text Classification, and other related fields.

Petr Grolmus

E-mail: indy@civ.zcu.cz

Petr used to be a co-founder of the Text-Mining research group. His interest was mainly focused on the identification of user profiles based on users behavior on the Web.

Related Downloads:

Teraman v1.0
Size:	2 kB
Desc:	Teraman is a tool for N-gram extraction from large text datasets. Our approach is based on batch processing and therefore it is able to process texts which are much larger than the available memory. The process consists of three steps: pre-processing & indexing, counting N-grams and de-indexing. The tool is developed in C# under the .NET Framework 2.0 which is required for running.
Related:	Document Classification