Semantic-labelling OCR ground truth data and store these data with METS metadata set.
Add the namespace http://www.ocr-d.de/GT/. We recommend gt as namespace prefix:
xmlns:gt="http://www.ocr-d.de/GT/"Set XSD schema location OCR-D_GT_schema.xsd:
xsi:schemaLocation="file:///OCR-D_GT_schema.xsd" or URL...See mets_example.xml.
The ontology is defined in
DefaultLabelTypes_3.xml taken from
https://github.com/PRImA-Research-Lab/semantic-labelling
The XSD is generated by transforming that ontology with an XSLT stylesheet.
java -jar ../saxon9he.jar -xsl:OCR-D_GT_labelschema_maker.xsl -s:DefaultLabelTypes_3.xmlOntology described in
Clausner, C and Antonacopoulos: Ontology and framework for semantic labelling of document data and software methods in: 13th IAPR International Workshop on Document Analysis Systems (DAS2018), 24-27 April 2018, Vienna, Austria. http://usir.salford.ac.uk/46896/
Implemented as a set of Java tools in https://github.com/PRImA-Research-Lab/semantic-labelling