Functional sample - Nástroj pro vyhledávání a analýzu faktů

Inserted by:Ing. Miloslav Konopík, Ph.D.
Date last modified:17.4.2021
Year of insertion2021
Size:339 kB
Number of downloads:3

Product description

Fact Search and Analysis Tool (Fact Search shortly) analyzes and compares various types of semantic and keyword document retrieval methods. It is meant to work on news databases, although it can be easily modified to work with related data. The current implementation is built on the Czech News Agency archive of news articles from 2000 to 2019. We implement classic (keyword search) based on TFIDF [1] as well as state-of-the-art Transformer-like neural networks [2, 3] for the semantic-oriented search. The later models are trained with fact-checking, i.e., textual claim support/refusal, in mind. The application can be used for the related Question-Answering tasks as well. Current models are trained using a Czech version of the FEVER [4] Wikipedia fact-checking dataset, which was developed by the CTU team. The follow-up models' training will employ an annotated fact-checking dataset created directly atop of the ČTK data that is presently being collected (the annotation application is closely related to the Fact Search one). From the user perspective, the Fact Search application allows real-time document search in extensive textual databases, simultaneously comparing multiple search methods. Along with retrieved documents, it gives statistics of the search procedures as well as a statistical description of document distributions. As an additional part of the output, it also provides prediction explanations at the word or sentence level, which helps assess retrieval model quality. More importantly, it helps users to focus on relevant parts of the retrieved text. The application further contains an initial version of the classifier module, giving confidence levels of claim veracity w.r.t. the news database. [1] Htut, Phu Mon, Samuel R. Bowman, and Kyunghyun Cho. "Training a ranking function for open-domain question answering." arXiv preprint arXiv:1804.04264 (2018). [2] Chang, Wei-Cheng, et al. "Pre-training tasks for embedding-based large-scale retrieval." arXiv preprint arXiv:2002.03932 (2020). [3] Reimers, Nils, and Iryna Gurevych. "Sentence-BERT: Sentence embeddings using siamese BERT-networks." arXiv preprint arXiv:1908.10084 (2019). [4] Thorne, James, et al. "FEVER: a large-scale dataset for fact extraction and verification." arXiv preprint arXiv:1803.05355 (2018).
The result was created within the grant: TL02000288 - Transformation of Journalisms Ethics in the Advent of Artificial Intelligence

This project is co-financed with the state support of the Technology Agency of the Czech Republic within the ETA Program.


The use of this product is governed by the following license: KIV-ZCU-EULA

ZCU/KIV End User License Agreement

Product files

1.DolozeniSplneni_V4.docx351 kB