I need to solve a problem whereas a scan of multiple documents (contracts, invoices, bank extracts) is stored into a PDF and I need to identify how many individual documents are contained in the PDF and which pages of the PDF belong to which document.
This scenario presents itself, for example, when a person feeds a bunch of documents into an automatic scanner that then creates a single PDF from these documents. Each document is just an image and might have one or more pages and may have different layouts.
What would be an intelligent AI approach to attacking this problem?