OCR Improvement tool.

This tool was created on request. I wasn't sure whether I should publish it as I had the idea that it is somehow a bit "custom made" for OCR application in a specific situation. Anyway, now I decided that it may be interesting for more people, or may lead to other ideas for applications; so here it is.


PDFOverlayer is meant to combine an optimized scanned image with the OCR result from an OCR-program. Again uses iTextSharp for PDF manipulation. It allows to

  • Overlay the image in a separate layer above the OCR'd text
  • Stamp the image over the original OCR'd text

Note that the input must be 1. the OCR'd text without a scanned image (so the scanning/OCR-application must offer the option to create OCR-output separate from - or  without the image) and 2. the scanned image (that you may have optimized for readability).


Andres Aule reports that PDF Overlayer also works excellently for adding custom headers / footers into a PDF file. All you have to do is create (in MS Word, OpenOffice, whatever) a file that contains the wanted header/footer/pagenumbers and then apply it in PDF Overlayer as the other layer. The only thing to keep in mind is that the pages should be approximately of the same size as in the underlying file.