Highest Voted Questions - Data Science Stack Exchange

8

votes

5 answers

Cosine similarity vs The Levenshtein distance

I wanted to know what is the difference between them and in what situations they work best? As per my understanding: Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the…

asked Nov 18 '19 at 08:52

Pluviophile

3,808
13
31
54

8

votes

3 answers

How to combine GridSearchCV with Early Stopping?

I'm a beginner in machine learning and want to train a CNN (for image recognition) with optimized hyperparameter like dropout rate, learning rate and number of epochs. The optimal hyperparameter I try to find via GridSearchCV from Scikit-learn. I…

asked Nov 15 '19 at 17:53

Code Now

393
1
5
10

8

votes

3 answers

How to find similarity between different factors in a dataset

Introduction Let's say I have a dataset of different observation of different people and I want to group people together to know which person is closest to the other one. I also want to have a measure to know how close they are to each others and…

asked Jun 26 '15 at 20:48

zipp

183
1
4

8

votes

2 answers

Data anonymization in Python

I am working on an industrial project which consists of real data. Now, the data contains sensitive information about company operations which could not be disclosed publically. As a result, I need to anonymize the original data first before…

asked Oct 23 '19 at 23:40

Muhammad Ali

2,487
5
19
22

8

votes

1 answer

Why is word prediction an obsession in Natural Language Processing?

I have heard how great BERT is at masked word prediction, i.e. predicting a missing word from a sentence. In a Medium post about BERT, it says: The basic task of a language model is to predict words in a blank, or it predicts the probability that a…

asked Oct 16 '19 at 14:52

SamR

183
1
5

8

votes

1 answer

Difference between Gensim word2vec and keras Embedding layer

I used the gensim word2vec package and Keras Embedding layer for various different projects. Then I realize they seem to do the same thing, they all try to convert a word into a feature vector. Am I understanding this properly? What exactly is the…

asked Oct 11 '19 at 13:25

Edamame

2,745
5
24
33

8

votes

2 answers

What is the difference between gradient descent and gradient boosting? Are they interdependent on each other by any way?

What is the difference between gradient descent and gradient boosting? Are they interdependent on each other in any way ?

asked Oct 09 '19 at 14:55

star

1,471
7
19
29

8

votes

2 answers

Best way to store large data set using R from Twitter?

I am working on a project that aims to retrieve a large data-set (i.e., tweet data which is a couple of days old) from Twitter using the twitteR library on R. have difficulty storing tweets because my machine has only 8 GB of memory. It ran out of…

asked Jun 18 '15 at 18:23

Digital Dude

181
1

8

votes

2 answers

Can a decision tree learn to solve a xOR problem?

I have read online that decision trees can solve xOR type problems, as shown in images (xOR problem: 1) and (Possible solution as decision tree: 2). My question is how can a decision tree learn to solve this problem in this scenario. I just don't…

asked Oct 04 '19 at 12:13

lguerra

83
1
5

8

votes

3 answers

Algorithm for segmentation of sequence data

I have a large sequence of vectors of length N. I need some unsupervised learning algorithm to divide these vectors into M segments. For example: K-means is not suitable, because it puts similar elements from different locations into a single…

asked Jun 14 '15 at 10:19

generall

273
1
11

8

votes

1 answer

how to check all values in particular column has same data type or not?

I have column 'ABC' which has 5000 rows. Currently, dtype of column is object. Mostly it has string values but some values dtype is not string, I want to find all those rows and modify those rows. Column is as following: 1 abc 2 def 3 ghi 4 23 5…

asked Sep 28 '19 at 14:59

Kiran

195
1
1
5

8

votes

2 answers

visualize a horizontal box plot in R

I have a dataset like this. The data has been collected through a questionnaire and I am going to do some exploratory data analysis. windows <- c("yes", "no","yes","yes","no") sql <- c("no","yes","no","no","no") excel <-…

asked Jun 11 '15 at 15:40

Hamideh

940
2
12
22

8

votes

2 answers

Text similarity with sentence embeddings

I'm trying to calculate similarity between texts with various lengths. My current approach is following: Using Universal Sentence Encoder, I convert text to a set of vectors. I average these vectors to create the final feature vector. I compare…

asked Sep 19 '19 at 20:04

Kertis van Kertis

133
1
6

8

votes

4 answers

How to learn spam email detection?

I want to learn how a spam email detector is done. I'm not trying to build a commercial product, it'll be a serious learning exercise for me. Therefore, I'm looking for resources, such as existing projects, source code, articles, papers etc that I…

asked Jun 01 '15 at 12:36

SmallChess

3,540
2
18
30

8

votes

2 answers

Time-series prediction: Model & data assumptions in AI/ML models vs conventional models

I was wondering if there was a good paper out there that informs about model and data assumptions in AI/ML approaches. For example, if you look at Time Series Modelling (Estimation or Prediction) with Linear models or (G)ARCH/ARMA processes, there…

asked Aug 29 '19 at 06:45

Maeaex1

550
2
15

Most Popular