4

By knowing each data point's coordinate, it is easy to apply them with clustering methods as k-means etc. By if the case is we only know the distances between each pair of data points without knowing the definite location coordinate of every data point, is it possible to apply any clustering methods in this case?

piratesailor
  • 171
  • 3

2 Answers2

3

K-Medoids

It would be possible with an adapted semi-supervised K-Means, also known as K-Medoids.

The tricky part with K-Means is that you do not know the centroids. However, you could hot start by assuming that some of your data points are centroids. Then, when figuring the new centroid at each iteration, instead of figuring out the "imaginary" central position, pick the point in the cluster that minimizes the sum to all other points in the cluster.

Hierarchical Clustering

You could also try a hierarchical clustering method. One example is the AgglomerativeClustering from Scikit-Learn. The idea is that you start merging points that minimize their linkage distance. Then, there is a certain criteria for determining points that are "too far away" and a new cluster should be created.

This is the fit() method documentation. Notice how you can either pass the instance features or the distance matrix between instances.

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering.fit

Bruno Lubascher
  • 3,548
  • 1
  • 12
  • 36
1

Yes it is possible however not all algorithms support this. For example, k-means will not be able to do this, because k-means use centroid which is an "imaginary" point on the space, hence inferring distance from this point to another point on the dataset is not possible without knowing the location of every datapoints. On the other hand, DBScan is able to do this, because this algorithm essentially perform union of set of points that are close to each other.

In general, you might want to understand roughly how each algorithm work to "guess" whether this behaviour is supported. You can also check the documentation for example sklearn's documentation.

Yohanes Alfredo
  • 1,123
  • 7
  • 15