1

I have scanned documents which contains both the typed text in english and then some handwritten text including dates, signature or other text. Can someone help pointing resources which (preferably in python) which detects, or separates these two types of objects in the image.
If commercial solutions like Azure, AWS textract or GCP etc do this job then will also work.
I don't want to go route of collecting images and then training a model to detect/classify the objects, as I believe, it should be a solved problem already

Sandeep Bhutani
  • 894
  • 1
  • 7
  • 24
  • Do the scanned documents have the same format? (i.e. the typed texts are always in the same places) If yes, it would limit your tasks to the handwritten text recognition, because you know the typed text content. – Nicolas Martin Aug 24 '21 at 08:05
  • For simplicity we can assume that format would be same, however text can not be same always. Also, the handwritten text can actually overlap the typed text, for example a signature – Sandeep Bhutani Aug 24 '21 at 17:33
  • Could you group documents so that they have the same typed text and hence apply adapted handwritten recognition, instead of developping an universal tool? I ask that because developping an universal tool might be too long to develop. If you can group documents and limit the activity to handwritten recognition, it could be much easier. – Nicolas Martin Aug 25 '21 at 08:50
  • In a financial document with different values, it can be difficult. However, if I arrange it, what would by your suggestion on the approach? – Sandeep Bhutani Aug 25 '21 at 17:44
  • I would suggest using Tesseract that can perform text field detection, as well as typed text and handwritten text recognition. This is a multi purpose tool that can answer to your different formats: https://www.pyimagesearch.com/2018/09/17/opencv-ocr-and-text-recognition-with-tesseract/ – Nicolas Martin Aug 26 '21 at 07:19

0 Answers0