Nomenclature Dataset v 1.0: Dataset for Detection and Recognition of Handwritten Nomenclatures and toponyms from Historical Cadastral Maps

General Information

Nomenclature Dataset serves for detection and recognition of so-called nomenclatures in historical cadastral maps. The nomenclature is a handwritten piece of text which identifies the position of the individual map sheet in the grid coverinfg a larger region. It covers two tasks: 1. Nomeclature Detection - finding the exact position of the nomenclature text within the map sheet 2. Nomenclature Recognition - transcribe the nomenclature by the means of optical charactre recognition (OCR) or handwritten text recognition (HTR) It is freely available for education and research purposes. However, any other use is strictly excluded!

The dataset contains 800 map sheets in total. It is divided into training, testing and validation parts that contain 650, 100 and 50 sheets respectively.

Technical Details

Two files with a same name are provided for each map sheet:

The scanned page as a .JPG file.
Annotation of the nomenclature position and complete transcription as a tab separated value file (extension .txt).
There are 3 classes in this dataset (N = nomenclature, H = handwritten toponym, P = printed toponym)

License

This dataset is licensed under the Attribution-NonCommercial-ShareAlike 4.0 International, so commercial use in any form is excluded.

Please, cite this paper when you used this dataset in your experiments.

Download

If you have additional questions / comments related to this dataset, please, do not hesitate to contact the authors: Ladislav Lenc llenc@kiv.zcu.cz or Pavel Král pkral@kiv.zcu.cz.