2

I have a fairly general problem and I wonder if it has a name. The problem statement, as best I can put it, is the following:

Let $I=\{i_1,i_2,...,i_n\}$ be a set of items. Let $C=\{C_1,C_2,...,C_m\}$ be a set of containers. Each container is unique and contains a set of items from $I$. Let $C_{a,b} \subseteq C$ be the set of containers that contain both $i_a$ and $i_b$. Find all $(i_a, i_b)$ pairs where $\left\vert{C_{a,b}}\right\vert \geq k$ for a given $k \in \mathbb{N}$.

If the items are people and the containers are times and places, then we are asking which people met at least $k$ times. If the items are groceries and the containers are shopping carts, then the problem is a variation of association rule learning. If the items are words and the containers are documents, then we are looking for a co-occurrence network.

2 Answers2

1

You can look into bipartite graphs; the item-set relationships form this type of graph, and the process of clustering highly-connected nodes (loosely speaking) is called community detection (it's similar to frequent itemset mining).

KWillets
  • 1,274
  • 8
  • 9
  • Yeah, it's possible to phrase this as a bipartite graph problem. I don't see how community detection could be used to solve it though. Can you please be a bit more specific? – Daniel Darabos Jul 08 '15 at 08:43
  • Community detection is a very broad term referring to clustering, clique-finding, etc. In this case a pair of items connected by k or more containers is a bipartite clique. – KWillets Jul 08 '15 at 19:45
  • But not all bipartite cliques represent such pairs, right? I'm not sure this is even a practical rephrasing of the problem. – Daniel Darabos Jul 09 '15 at 12:48
  • Well yes, they're not necessarily maximal or anything. And I agree this problem might not be usefully rephrased as a clique problem, since it's a very restricted case. Are you looking for useful algorithms, or, say, generalizations of the problem, or more interesting variations? My answer might help with the latter but I don't know what you're after. – KWillets Jul 09 '15 at 17:35
  • I have some algorithms and some variations. I submitted a (non-academic) conference abstract to talk about this topic calling it the "co-location problem". Then it struck me that perhaps that's not the official name, and maybe I've been missing out on a large body of prior work. So I'm looking for a name here, and maybe literature references. Thanks! – Daniel Darabos Jul 09 '15 at 17:44
0

I don't know if it has a special name. It looks like one algorithmic approach to solve it is by using matrix multiplication: let $M_{i,j}=1$ if item $i$ is in container $j$, otherwise $M_{i,j}=0$; then compute $N = M \cdot M^t$ and search for all cells of $N$ such that $N_{a,b}\ge k$. In addition to the problems you mention, it's also similar to Computing the "at least k friends in common" graph.

D.W.
  • 159,275
  • 20
  • 227
  • 470
  • Thanks! For large $n$ it's not feasible to enumerate all pairs. Anyway I'm just looking for the name. The question you link is indeed related. Too bad it has no answer. – Daniel Darabos Jul 08 '15 at 08:41