k-means is a family of cluster analysis methods in which you specify the number of clusters you expect. This is as opposed to hierarchical cluster analysis methods.
Questions tagged [k-means]
452 questions
4
votes
3 answers
Boundary conditions for clustering
I have some data that I would like to cluster with k-means.
One of the features is the hour of the day.
The problem is that the hour '23' is considered far from the hour '0'.
How can I map the data so that the boundary will be cyclic?

scf
- 185
- 1
- 5
3
votes
1 answer
Original k-Means Research Paper
I'm having difficulty searching for the original published paper proposing k-Means as an algorithm. I have been inspired to find it as reference for similar work, inspired by this TowardsDataScience article.
I have tried wiki's references and Google…

StressedBoi3
- 43
- 5
3
votes
1 answer
Difference between cluster centers and means
The following is the output of the cluster centers I got from a cluster model (kmeans - 6 clusters)
3.371069, 3.920354, 3.629747, 3.700000, 3.988506, 3.740385
However, after segmenting the data into the 6 clusters and taking the average of the…

kiva
- 31
- 1
2
votes
1 answer
How is the Schwarz Criterion defined?
I am currently reading slides about the $k$-means algorithm. In the analysis, the professor writes
Minimize Schwarz Criterion: $W(C) + \lambda m k \log R$
$W(C)$ is Within-class scatter. I guess $\lambda$ is a weighting factor which has to be…

Martin Thoma
- 18,880
- 35
- 95
- 169
2
votes
1 answer
Implementation of kmeans clustering using R
I have implemented kmeans clustering on iris dataset (inbuilt dataset) in R. The code is given…

user44436
- 21
- 2
1
vote
0 answers
How to find the feature/(data column) that separates each cluster of K-means?
I have a general question on applying k-means clustering on the datasets..
How to find the feature/(data column) that separates each cluster of K-means?
may be using scikit-learn in python ..
Best Regards,
Swati

jaiswati_b
- 21
- 3
1
vote
1 answer
Scaling negative and positive variables when performing a k-means cluster analysis
I'm looking to perform a k-means cluster analysis on a set of data that contains variable ranges that contain both positive and negative values. Given the rangers vary so much the data will need to be scaled, but my concern is with the variables…

Jeff
- 131
- 1
- 5
1
vote
2 answers
Theoretical work on validity of restricting movement of Centroid of K-Mean
I recently received a manuscript for review in which author used ~1000 "fake" data points, so that the final centroid of K-mean stays within the required range. Neither me nor the author seems to have background in data science and the paper is more…

Joe89
- 11
- 2
1
vote
2 answers
K-means sensitivity to outliers?
I'm studying K-means, and one important drawback of K-means is the lack of robustness to outliers. My question is: are there any cases when the lack of robustness to outliers may be considered not as a defect of K-means but as a virtue instead?

dxdydz
- 11
- 2
1
vote
1 answer
k-means with one cluster
K-means may give different results, because the initial choice of centroids is random.
However, if I were to choose k=1, will the algorithm always provide the same answer equal to the "barycentre" of my data?

user
- 1,993
- 6
- 21
- 38
1
vote
0 answers
How to calculate the purity of K-Means clustering
I am trying to work out how to I have a labelled dataset that I want to cluster with scikit-learn k-means. The label's column name is "Classes"
I don't want the labels to interfere with the clustering so I drop the numeric label (range 1-7) and run…

Bryon
- 111
- 4
0
votes
1 answer
color compression using k-means algorithm
I am reading about k means algorithm at this link.
At ln[22] here author mentioned that Input color space is 16 million possible colors. How author came up with 16 million number here. Kindly explain.
Additionally at the end it is mentioned as…

VRK
- 11
- 2
0
votes
3 answers
Confused about how to graph my high dimensional dataset with Kmeans
PLEASE NO SKLEARN ANSWERS
So I have a dataset that is very high dimensional, and I am very confused about to convert it into a form that can be used to plot with Kmeans. Here is an example of what my dataset looks like:
Blk Students % White…

vladimir_putin
- 1
- 1
0
votes
1 answer
What is normalized winning frequency in kernel self organizing map(SOM)?
In the k-means based kernel SOM, proposed by MacDonald and Fyfe (2000), the update of the mean is based on a soft learning algorithm
mi(t + 1) = mi(t) + Λ[φ(x) − mi(t)]
where Λ is the normalized winning frequency of the i-the mean and is defined…