ChronSeg v 1.0: Dataset for Segmentation of Handwritten Historical Chronicles


General Information

ChronSeg is a dataset for the segmentation of historical handwritten chronicles into text, image and background classes. It is composed of the documents provided by the Porta fontium and is freely available for education and research purposes. However, any other use is strictly excluded!

The main part of the dataset consists of 5 chronicles in a total of 38 double-sided pages from which 18 contain images. There is also an experimental part that contains 20 printed pages from documents of different types.

Technical Details

Three files with a same name are provided for each document page:

  1. The scanned page as a .jpg file.
  2. Annotation of the scanned page as a .xml file in PAGE format.
  3. Pixel-wise ground-truth of the scanned page as a .png file. R, G and B channels represent text, image and background classes respectively.

License

This dataset is licensed under the Attribution-NonCommercial-ShareAlike 4.0 International, so commercial use in any form is excluded.

Download

For further information about this dataset, please, see the paper below:

  • J. Baloun, P. Král and L. Lenc, ChronSeg: Novel Dataset for Segmentation of Handwritten Historical Chronicles, FullText, Bibtex.
  • Please, cite this paper when you used this dataset in your experiments.

    If you have additional questions / comments related to this dataset, please, do not hesitate to contact the authors: Josef Baloun balounj@kiv.zcu.cz or Pavel Král pkral@kiv.zcu.cz.