Text Corpora


Fact Checking Dataset

Sentiment Analysis Datasets

Czech Text Document Corpus

Czech Historical Named Entity Corpus

OCR Corpora and Tools

Text processing logo OCR logo

Image Corpora


ChronSeg: Dataset for Segmentation of Handwritten Historical Chronicles

Heimatkunde: Dataset for Multi-modal Historical Document Analysis

COMICORDA: Dialogue Act Recognition in Comic Books

Unconstrained Facial Images: Database for Face Recognition under Real-world Conditions

Img processing logo Faces logo

Historical Maps Corpora


Historical Map Dataset v 1.0: Dataset for Detection and Segmentation tasks in Historical Maps

Historical Map Dataset v 2.0: Extended Dataset for Detection and Segmentation tasks in Historical Cadastral Maps

Nomenclature Dataset: Dataset for Detection and Recognition of Handwritten Nomenclatures and toponyms from Historical Cadastral Maps

Map processing logo Map processing logo