Extract key phrases from a single document

Question

I need to extract relevant key phrases from a single document. Since I don't have a lot of documents, TF-IDF doesn't really work.

Currently I'm using TextRank. It produces okay-ish result - some really good phrases along with a lot of garbage.

Is there a better algorithm to use for this? Can anyone give me a rundown of available options?

Real-world use case: I'm developing a help desk app that comes with Knowledge Base (a bunch of articles, think of it as FAQ). When a user writes a new support ticket I want to extract key phrases and find the most relevant KB articles. Overall there is not enough data to train a model. I need to compare sets of key phrases I think.

score 3 · Accepted Answer · answered Nov 22 '17 at 16:03

A related keyword to your case can be Single Document Keyword Extraction. A good paper about this is:

We present a new keyword extraction algorithm that applies to a single document without using a corpus. Frequent terms are extracted first, then a set of cooccurrence between each term and the frequent terms, i.e., occurrences in the same sentences, is generated. Co-occurrence distribution shows importance of a term in the document as follows. If probability distribution of co-occurrence between term a and the frequent terms is biased to a particular subset of frequent terms, then term a is likely to be a keyword. The degree of biases of distribution is measured by the $\chi^2$-measure. Our algorithm shows comparable performance to tfidf without using a corpus.

You can find the paper here.

In sum, this paper gives a rank on keywords based on the defined $\chi^2$-measure.

Thanks. Do you know if there are any Python implementations? I don't think I'm smart enough to do it myself. — Max Al Farakh, Nov 22 '17 at 16:39
@MaxAlFarakh https://www.airpair.com/nlp/keyword-extraction-tutorial — trollster, Nov 22 '17 at 16:58
@trollster I tried RAKE. Did not produce good results. I'll look into it again. — Max Al Farakh, Nov 22 '17 at 17:27

Extract key phrases from a single document

1 Answers1