Questions tagged [clustering]

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval etc.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

1396 questions

votes

2 answers

Clustering unique visitors by useragent, ip, session_id

Given website access data in the form session_id, ip, user_agent, and optionally timestamp, following the conditions below, how would you best cluster the sessions into unique visitors? session_id: is an id given to every new visitor. It does not…

clustering

asked May 15 '14 at 09:04

AdrianBR

votes

3 answers

How to evaluate clustering success in a completely unsupervised system?

The algorithm in question is Kohonen's SOM. But the question could also apply to PCA and some others. When the umatrix (or the codebook?) is examined, is there a way to tell how successful clustering was? And would it be a good idea to apply GA's to…

clustering

asked Jul 09 '15 at 04:44

AkKoh

votes

4 answers

Why does OPTICS use the core-distance as a minimum for the reachability distance?

The OPTICS clustering algorithm defines $$\text{core-dist}_{\varepsilon,MinPts}(p)=\begin{cases}\text{UNDEFINED} & \text{if } |N_\varepsilon(p)| < MinPts\\ MinPts\text{-th smallest distance to } N_\varepsilon(p) &…

clustering

asked May 07 '16 at 12:37

Martin Thoma

18,880
35
95
169

votes

1 answer

Clusering based on categorical variables?

I am working on a project and currently experimenting cluster analysis. The dataset is mainly categorical variables and discrete numbers. Please pardon my poor programming skills as I am not very familiar with MathJax, but I will try to summarize…

clustering

asked Jun 28 '16 at 20:23

Jing

votes

2 answers

Is this cluster analysis / prediction?

I have a series of seemingly random data dripping in one value at a time through time. Although it appears to be random, the data forms clusters when certain attributes are analysed which the charts show. I'm trying to avoid the fallacy of seeing…

clustering

asked Mar 03 '16 at 00:21

user3791372

votes

1 answer

Interpret clustering results after variable transformation

since some time I have a question to which I have not found the proper answer yet. My doubt concerns the interpretation of the results of a clustering algorithm which was run on features to which a log-transformation was applied. Specifically, let's…

clustering

asked Dec 10 '17 at 18:40

Seymour

votes

2 answers

When is centering and scaling needed before doing hierarchical clustering?

I am working on a clustering project where we have collected protein data from over 100 patients samples. This data is normalized and log transformed. The goal is to cluster samples based upon their similarities, I am using hierarchal clustering and…

clustering

asked Aug 17 '17 at 17:50

Mdhale

votes

2 answers

Is it possible to run clustering methods by only knowing the distance between pair of points?

By knowing each data point's coordinate, it is easy to apply them with clustering methods as k-means etc. By if the case is we only know the distances between each pair of data points without knowing the definite location coordinate of every data…

clustering

asked Dec 29 '19 at 06:30

piratesailor

votes

1 answer

Finding clusters in multidimensional data

I have a set of data from 3,000 records. There are 5 attributes per individual (labelled A - E). I can use Kendall's W (coefficient of concordance) to determine the concordance between any two records. What I require is a way to discern any…

clustering

asked Sep 07 '18 at 00:40

Carl

votes

2 answers

What is Spectral clustering?

What is spectral clustering? I have little background in statistics. I have tried to search for notes online but they assume quite a lot of knowledge. Would be good if you are able to find some notes online which teach the basics and the math…

clustering

asked Jun 21 '18 at 01:27

listener

votes

1 answer

Can some one explain how PCA is relevant in extracting parameters of Gaussian Mixture Models

I am having some difficulty in seeing connection between PCA on second order moment matrix in estimating parameters of Gaussian Mixture Models. Can anyone connect the above??

clustering

asked Nov 23 '14 at 02:27

tejaswi

votes

1 answer

Using SVD for clustering

The dataset that I am experimenting with is in the form of a table with columns userid and itemid. If there is a row for a given user and a given item, that means the user accessed the item (like in an online store). I am trying to cluster similar…

clustering

asked Oct 02 '14 at 03:30

rbk

votes

3 answers

clustering data set based on the similarity of tree structure

I have a data set (>5000). each individual record of data is structured as a multilevel n-ary tree (>200 nodes). The tree node identifiers are unique within the tree. but the same identifiers are used to represent the same type of node across the…

clustering

asked Jun 30 '21 at 12:31

user3691191

votes

3 answers

Expectation number of points in initial clustering for LSH

I have a very skewed, 10-dimensional data set. I need approximate nearest neighbours for my use case and I was looking into Locality senstive hashing. However after scaling and randomly generating hyperplanes through the origin and coding the data…

clustering

asked Jan 08 '16 at 10:58

Jan van der Vegt

9,368
35
52

votes

3 answers

Clustering based on features of varied importance

Suppose I have a dataset that includes the following features {HairColor, EyeColor, EducationLevel, Income}. I would like to perform clustering to separate the dataset into smaller datasets that you would expect to behave similarly. The difficulty…

clustering

asked Apr 19 '21 at 13:32

Gilad Felsen

2 3 4 5 6 7 8 Next