How does vector space model differs from traditional B-tree indexes

Question

I've asked the following question on StackOverflow: https://stackoverflow.com/questions/71819288/mongodb-vs-elasticsearch-indexing-parallel-arrays

I think that getting theoretical information on this subject can help me better to grasp the context and maybe I'll be able to answer my own question.

Note: I've changed the title, here it is more comments for clarification.

How does ElasticSearch like solutions differs from traditional databases? On storing, retrieving, modeling data. Most importantly, in case I want to do exact matching and filtering. Will there be real benefit of using such a technology considering I created my NoSQL databases indexes perfectly.

I think B-trees are just one way to implement an "inverted" index. (As I understand "inverted" is just a bit of a misnomer from ancient computer history, it basically allows reverse (inverted?) lookup, nothing "inverted" in an inverted index; or in other words, every index is "inverted" (allows reverse lookup), otherwise it wouldn't be an index). As I understand, if people talk about inverted indexes it just tells you they studied information systems (as I did). Please correct me if I am wrong! — TilmannZ, Apr 12 '22 at 12:24
Thanks, you're right. I was also a bit uncertain about the thread name. Now, I've changed. Hopefully, this is more clear now. — BySpecops., Apr 12 '22 at 18:41
@TilmannZ Just for reference, here's a previous answer that explains where the term "inverted index" comes from. https://cs.stackexchange.com/a/130833/6553 — Pseudonym, Apr 13 '22 at 01:34

Rinkesh P · Answer 1 · 2022-04-13T04:14:24.390

*In response to the original question, inverted index vs btree indexes*

Lets look at what these 2 types of indexes are

Forward Index/Index

Here the search key(attribute on which the index is built) is the name of the document. Consider a telephone book, where names are sorted alphabetically. While searching the phone number for a person whose name starts with N, you directly go the page where names from N start and then continue searching till you find the name you are looking for.

Inverted Index

Here the search key is a part of the content of any of the document. Take for instance the index at the back of a reference book. The keywords are listed alphabetically and the index show the page numbers on which a given keyword is present. In this case too if you want to find where a keyword starting with P occurs, you would directly go to the page where keywords for P are listed and then search further.

Knowing both these ideas, you might think, can one use a forward index to search the pages where a keyword occurs, or can you use an inverted index to find the phone number of a person, and the answer is yes, you can search anything using any type of index theoretically. However, the main purpose of an index is to make the lookup/search take less time, and this depends on what you are trying to search and what indexes you are using to search.

Consider a database having reviews for a movie stored area wise. Assume it has 3 columns id, area_code, review. Id is the primary key, areacode is a unique integer representing some area and review is a text review.

Lets say you want to find out what the users of area 007 think about your movie, in this case you create an index on the column movie and easily find out the reviews for that area.

Now consider that you want to find out the number of users who gave a positive review to your movie. A review would be a text, of highly variable length and content and very high redundant content. But you assume that certain terms like "excellent", "mind bending", "masterpiece" etc could be found in a positive review. So here you use an inverted index, which basically would tell you in which reviews there were positive terms used, so you can get a rough estimate of the count.

And finally you can combine both scenarios where you want to find out the good reviews of a particular area, so you use both the indexes.

This is a theoretical description of what those indexes are and how they are to be used. But when you need to implement them, you need to think about how it would perform (reducing disk accesses or as in my example at the beginning, you need some way to reduce the number of pages you have to go through) and one way to approach this is to use B-trees.

You can read upon B-trees anywhere, but to answer your question, a both the forward and inverted index can use B-trees, because in both the cases you have an index file which needs to be accessed efficiently, and B-tree lets you do exactly that.

Thanks for your answer. What about the "Vector Space Model"? Theoretically how storing the JSON data and putting the index in some NoSQL database (as MongoDB) differs from storing the data as mappings (as it is on ElasticSearch). — BySpecops., Apr 12 '22 at 18:29
I am afraid I don't know much on that, my answer is in response to your original question. But a tip. You shouldn't change the title of your question because may invalidate the answers you have received until then and there are others too who might have the same doubt. So unless your question is badly downvoted or closed I suggest you keep it the way it is if you have received answers. If you think your doubt was something else just ask a new question, that way it is much more clear and even you are contributing more to the community :) — Rinkesh P, Apr 13 '22 at 04:10

How does vector space model differs from traditional B-tree indexes

1 Answers1