Highest Voted 'k-means' Questions - Data Science Stack Exchange

4

votes

3 answers

Boundary conditions for clustering

I have some data that I would like to cluster with k-means. One of the features is the hour of the day. The problem is that the hour '23' is considered far from the hour '0'. How can I map the data so that the boundary will be cyclic?

k-means

asked Nov 11 '15 at 14:27

scf

185
1
5

3

votes

1 answer

Original k-Means Research Paper

I'm having difficulty searching for the original published paper proposing k-Means as an algorithm. I have been inspired to find it as reference for similar work, inspired by this TowardsDataScience article. I have tried wiki's references and Google…

k-means

asked Apr 12 '21 at 19:36

StressedBoi3

43
5

3

votes

1 answer

Difference between cluster centers and means

The following is the output of the cluster centers I got from a cluster model (kmeans - 6 clusters) 3.371069, 3.920354, 3.629747, 3.700000, 3.988506, 3.740385 However, after segmenting the data into the 6 clusters and taking the average of the…

k-means

asked Mar 19 '19 at 04:14

kiva

31
1

2

votes

1 answer

How is the Schwarz Criterion defined?

I am currently reading slides about the $k$-means algorithm. In the analysis, the professor writes Minimize Schwarz Criterion: $W(C) + \lambda m k \log R$ $W(C)$ is Within-class scatter. I guess $\lambda$ is a weighting factor which has to be…

k-means

asked Dec 02 '15 at 22:16

Martin Thoma

18,880
35
95
169

2

votes

1 answer

Implementation of kmeans clustering using R

I have implemented kmeans clustering on iris dataset (inbuilt dataset) in R. The code is given…

k-means

asked Jan 10 '18 at 16:54

user44436

21
2

1

vote

0 answers

How to find the feature/(data column) that separates each cluster of K-means?

I have a general question on applying k-means clustering on the datasets.. How to find the feature/(data column) that separates each cluster of K-means? may be using scikit-learn in python .. Best Regards, Swati

k-means

asked Oct 17 '19 at 07:13

jaiswati_b

21
3

1

vote

1 answer

Scaling negative and positive variables when performing a k-means cluster analysis

I'm looking to perform a k-means cluster analysis on a set of data that contains variable ranges that contain both positive and negative values. Given the rangers vary so much the data will need to be scaled, but my concern is with the variables…

k-means

asked Jul 03 '19 at 19:50

Jeff

131
1
5

1

vote

2 answers

Theoretical work on validity of restricting movement of Centroid of K-Mean

I recently received a manuscript for review in which author used ~1000 "fake" data points, so that the final centroid of K-mean stays within the required range. Neither me nor the author seems to have background in data science and the paper is more…

k-means

asked Oct 29 '18 at 01:00

Joe89

11
2

1

vote

2 answers

K-means sensitivity to outliers?

I'm studying K-means, and one important drawback of K-means is the lack of robustness to outliers. My question is: are there any cases when the lack of robustness to outliers may be considered not as a defect of K-means but as a virtue instead?

k-means

asked Oct 07 '18 at 23:32

dxdydz

11
2

1

vote

1 answer

k-means with one cluster

K-means may give different results, because the initial choice of centroids is random. However, if I were to choose k=1, will the algorithm always provide the same answer equal to the "barycentre" of my data?

k-means

asked Feb 21 '18 at 02:40

user

1,993
6
21
38

1

vote

0 answers

How to calculate the purity of K-Means clustering

I am trying to work out how to I have a labelled dataset that I want to cluster with scikit-learn k-means. The label's column name is "Classes" I don't want the labels to interfere with the clustering so I drop the numeric label (range 1-7) and run…

k-means

asked Apr 17 '22 at 11:24

Bryon

111
4

0

votes

1 answer

color compression using k-means algorithm

I am reading about k means algorithm at this link. At ln[22] here author mentioned that Input color space is 16 million possible colors. How author came up with 16 million number here. Kindly explain. Additionally at the end it is mentioned as…

k-means

asked Nov 05 '19 at 13:12

VRK

11
2

0

votes

3 answers

Confused about how to graph my high dimensional dataset with Kmeans

PLEASE NO SKLEARN ANSWERS So I have a dataset that is very high dimensional, and I am very confused about to convert it into a form that can be used to plot with Kmeans. Here is an example of what my dataset looks like: Blk Students % White…

k-means

asked Oct 20 '18 at 18:06

vladimir_putin

1
1

0

votes

1 answer

What is normalized winning frequency in kernel self organizing map(SOM)?

In the k-means based kernel SOM, proposed by MacDonald and Fyfe (2000), the update of the mean is based on a soft learning algorithm mi(t + 1) = mi(t) + Λ[φ(x) − mi(t)] where Λ is the normalized winning frequency of the i-the mean and is defined…

k-means

asked Mar 01 '24 at 18:02

Anshuman Jayaprakash

3
2

Questions tagged [k-means]