Questions tagged [classification]

An instance of supervised learning that identifies the category or categories which a new instance of dataset belongs.

In machine learning and statistics, classification refers to the problem of predicting category memberships based on a set of pre-labeled examples. It is thus a type of supervised learning.

Some of the most important classification algorithms are support vector machines svm, logistic regression, naive Bayes, random forest random-forest and artificial neural networks neural-network.

When we wish to associate inputs with continuous values in a supervised framework, the problem is instead known as regression. The unsupervised counterpart to classification is known as clustering (or cluster analysis), and involves grouping data into categories based on some measure of inherent similarity.

3281 questions

votes

6 answers

Cosine similarity versus dot product as distance metrics

It looks like the cosine similarity of two features is just their dot product scaled by the product of their magnitudes. When does cosine similarity make a better distance metric than the dot product? I.e. do the dot product and cosine similarity…

classification

asked Jul 15 '14 at 21:30

ahoffer

votes

2 answers

Finding optimal threshold in multi-class classification task

In a binary classification problem, it is easy to find the optimal threshold (F1) by setting different thresholds, evaluating them and picking the one with the highest F1. Similarly is there a proper way to find optimal thresholds for all the…

classification

asked Jul 06 '20 at 21:01

saiRegrefree

votes

3 answers

How can I classify text considering word order, instead of just using a bag-of-words approach?

I've made a Naive Bayes classifier that uses the bag-of-words technique to classify spam posts on a message board. It works, but I think I could get much better results if my models considered the word orderings and phrases. (ex: 'girls' and 'live'…

classification

asked Oct 02 '14 at 23:15

Yerk

votes

3 answers

AUC-ROC of a random classifier

Why the area under the ROC Curve for a random classifier is equal to 0.5 and has diagonal shape? For me a random classifier would have 25% of TP,TN,FP,FN and therefore it would only be a single point on the ROC Curve.

classification

asked May 20 '18 at 06:12

Victor

votes

1 answer

F1 score vs accuracy, which metric is more important?

I have two multiclass classification models for making predictions (number of classes is three to be precise). One is Keras neural network, other is Gradient Boosted Classifier from Scikit Learn library. I have noticed that after training on same…

classification

asked Dec 23 '19 at 16:50

Ach113

votes

4 answers

Measuring the uncertainty of predictions

Given a multiclass classification model, with n features, how can I measure the uncertainty of the model for that particular classification? Let's say that for some class the model accuracy is amazing, but for another it's not. I would like to find…

classification

asked Jul 22 '18 at 12:36

Latent

votes

2 answers

The meaning of multi-class classification rules

The meaning of multi-class classification rules Example: I have two classification rules (Refund is a predictor and Cheat is a binary response): (Refund, No) → (Cheat, No) Support = 0.4, Confidence = 0.57 (Refund, No) → (Cheat, Yes) Support = 0.3,…

classification

asked Nov 08 '14 at 12:22

Xuan Dung

votes

2 answers

significance test and sample size estimation for classifiers

What is the test to tell if e.g. an F1 score of 0.69 for classifier A and 0.72 for classifier B is truly different and not just by chance? (for mean-values one would use a "t-test" and obtain a "p-value"). I have access to the underlying data and…

classification

asked Aug 28 '20 at 07:56

lordy

votes

1 answer

Large Scale Personalization - Per User vs Global Models

I'm currently working on a project that would benefit from personalized predictions. Given an input document, a set of output documents, and a history of user behavior, I'd like to predict which of the output documents are clicked. In short, I'm…

classification

asked Jun 30 '14 at 20:51

Madison May

2,029
2
17
18

votes

2 answers

F1 maximization with convolutional neural net. for an imbalanced dataset

I'm dealing with an imbalanced dataset for binary classification (about 70% to 30%). I was wondering what is the best way to optimize the F1 score for such a task when using a convolutional neural net. As of now, I'm sampling the dataset in order to…

classification

asked Dec 18 '16 at 17:11

Rimbaud_

votes

1 answer

How are selected the features for a decision tree in CART?

Suppose I want to use CART as classification tree (I want a categorical response). I have the training set, and I split it using observation labels. Now, to build the decision tree (classification tree) how are selected the features to decide which…

classification

asked Jul 02 '14 at 16:41

gc5

votes

3 answers

Which non-training classification methods are available?

I am trying to find which classification methods, that do not use a training phase, are available. The scenario is gene expression based classification, in which you have a matrix of gene expression of m genes (features) and n samples…

classification

asked Jul 02 '14 at 13:40

gc5

votes

1 answer

K nearest neighbour

Is the k-nearest neighbour algorithm a discriminative or a generative classifier? My first thought on this was that it was generative, because it actually uses Bayes' theorem to compute the posterior. Searching further, it seems like it is a…

classification

asked Dec 13 '14 at 23:08

101

votes

1 answer

Deal with overlapping classes in classification modeling

I am currently working with a dataset comprising information about crop insurance for soybeans. My ultimate goal with this dataset is to create a classification model capable of predicting whether insurance for soybeans will be activated based on…

classification

asked Mar 22 '24 at 02:34

EduMinsky

votes

4 answers

How to classify using incomplete features

Assume we have some features pressure, volume, temperature, intensity, mass, size, ... The problem is that, I do not have allways a complete set of these info. I can not put zero for the unknown featurs because it has a meaning. For example if I do…

classification

asked Nov 10 '21 at 15:16

Mohammad Nakhaee

2 3 4 5 6 7 Next