0

I am researching a project which would include document scanning. What I want to achieve is something like the iOS document scanner in the notes app.

Here the input is a single image from a smartphone camera. The app is able to automatically guess the boundaries of the document within the image, and then flatten the document: un-transforming the perspective keystoning due to the the camera angle, and even handling multiple segments of the document at different angles, for instance in the example of a letter which has come out of an envelope.

What techniques or tools would I look into for understanding how I could achieve something like this?

sak
  • 103
  • 4

1 Answers1

1

For the boundary estimation, you can use clustering techniques such as k-means. The black text will have low values in the image (close to 0) and the white boundaries will have high values (close to 255). Using a simple k-mean algorithm with 2 centroids, you can find an optimum threshold to split text from the surroundings. With this, you can find where the text starts and where it ends. Once you have 4 corners of the text, use OpenCV perspective transform to correct the image.

Sina
  • 111
  • 2