I need some direction for a solution on querying gigabytes of text

Asked Jan 27 '24 at 09:16

Active Jan 27 '24 at 09:16

Viewed 16 times

I have a bunch of old automotive magazines from the 1900's all the way up to the 1990's with (i imagine) some very old and lost forgotten info. The magazines have been converted to PDF, OCR'd and then converted into text. Now I need an AI solution to query all of this data.

I did try and use a vector database Weaviate with a gpt4 model, but the API crapped it's pants with documents larger than 8000 tokens. I tried breaking them up, but we're talking a few gigabytes of text which would take ages for openai to consume. In anycase, it was a dead end.

I am a software developer by trade (not a very good one but I get by OK) so the technical aspects of this don't scare me, but I don't know if I should be training a custom model or using some other existing solution? I just don't know where to go from here.

asked Jan 27 '24 at 09:16

Vince

I need some direction for a solution on querying gigabytes of text

0 Answers0