Medhat, Fady and Mohammadi, Mahnaz and Jaf, Sardar and Willcocks, Chris and Breckon, Toby and Matthews, Peter and McGough, Andrew Stephen and Theodoropoulos, Georgios and Obara, Boguslaw (2018) 'TMIXT : a process flow for Transcribing MIXed handwritten and machine-printed Text.', IEEE International Conference on Big Data. Seattle, WA, USA, 10-13 December 2018.
Abstract
—Text recognition of scanned documents is usually dependent upon the type of text, being handwritten or machine-printed. Accordingly, the recognition involves prior classification of the text category, before deciding on the recognition method to be applied. This poses a more challenging task if a document contains both handwritten and machine-printed text. In this work, we present a generic process flow for text recognition in scanned documents containing mixed handwritten and machine-printed text without the need to classify text in advance. We have realized the proposed process flow using several open-source image processing and text recognition packages. The speed process and the amount of text documents used in organization such as defense that can not be processed by humans without considerable amount of automation, will be efficiently and effectively handled by this proposed work flow. The evaluation was performed using a specially developed variant, presented in this work, of the IAM handwriting database, where we have achieved an average transcription accuracy of nearly 80% for pages containing both printed and handwritten text.
Item Type: | Conference item (Paper) |
---|---|
Full text: | Publisher-imposed embargo (AM) Accepted Manuscript File format - PDF (1548Kb) |
Status: | Peer-reviewed |
Publisher Web site: | http://cci.drexel.edu/bigdata/bigdata2018/index.html |
Date accepted: | 08 November 2018 |
Date deposited: | 15 November 2018 |
Date of first online publication: | 2018 |
Date first made open access: | No date available |
Save or Share this output
Export: | |
Look up in GoogleScholar |