I have a series of arrays
[Apple,Banana,Cherry,Date]
[Apple,Fig,Grape]
[Banana,Cherry,Date,Elderberry]
[Fig,Grape]
and I would like to build some clusters that associate the arrays into groups based on overlap
Group1: Array1 and Array3 as they have 3 common words
Group2: Array2 and Array4 as they have 2 common words
etc..
I was thinking kmeans but there is really not a distance calculation - more like an overlap one.
Does anyone have a suggestions?
Thanks!