Most Popular
1500 questions
8
votes
5 answers
Cosine similarity vs The Levenshtein distance
I wanted to know what is the difference between them and in what situations they work best?
As per my understanding:
Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the…

Pluviophile
- 3,808
- 13
- 31
- 54
8
votes
3 answers
How to combine GridSearchCV with Early Stopping?
I'm a beginner in machine learning and want to train a CNN (for image recognition) with optimized hyperparameter like dropout rate, learning rate and number of epochs.
The optimal hyperparameter I try to find via GridSearchCV from Scikit-learn.
I…

Code Now
- 393
- 1
- 5
- 10
8
votes
3 answers
How to find similarity between different factors in a dataset
Introduction
Let's say I have a dataset of different observation of different people and I want to group people together to know which person is closest to the other one. I also want to have a measure to know how close they are to each others and…

zipp
- 183
- 1
- 4
8
votes
2 answers
Data anonymization in Python
I am working on an industrial project which consists of real data. Now, the data contains sensitive information about company operations which could not be disclosed publically. As a result, I need to anonymize the original data first before…

Muhammad Ali
- 2,487
- 5
- 19
- 22
8
votes
1 answer
Why is word prediction an obsession in Natural Language Processing?
I have heard how great BERT is at masked word prediction, i.e. predicting a missing word from a sentence.
In a Medium post about BERT, it says:
The basic task of a language model is to predict words in a blank, or it predicts the probability that a…

SamR
- 183
- 1
- 5
8
votes
1 answer
Difference between Gensim word2vec and keras Embedding layer
I used the gensim word2vec package and Keras Embedding layer for various different projects. Then I realize they seem to do the same thing, they all try to convert a word into a feature vector.
Am I understanding this properly? What exactly is the…

Edamame
- 2,745
- 5
- 24
- 33
8
votes
2 answers
What is the difference between gradient descent and gradient boosting? Are they interdependent on each other by any way?
What is the difference between gradient descent and gradient boosting? Are they interdependent on each other in any way ?

star
- 1,471
- 7
- 19
- 29
8
votes
2 answers
Best way to store large data set using R from Twitter?
I am working on a project that aims to retrieve a large data-set (i.e., tweet data which is a couple of days old) from Twitter using the twitteR library on R. have difficulty storing tweets because my machine has only 8 GB of memory. It ran out of…

Digital Dude
- 181
- 1
8
votes
2 answers
Can a decision tree learn to solve a xOR problem?
I have read online that decision trees can solve xOR type problems, as shown in images (xOR problem: 1) and (Possible solution as decision tree: 2).
My question is how can a decision tree learn to solve this problem in this scenario. I just don't…

lguerra
- 83
- 1
- 5
8
votes
3 answers
Algorithm for segmentation of sequence data
I have a large sequence of vectors of length N. I need some unsupervised learning algorithm to divide these vectors into M segments.
For example:
K-means is not suitable, because it puts similar elements from different locations into a single…

generall
- 273
- 1
- 11
8
votes
1 answer
how to check all values in particular column has same data type or not?
I have column 'ABC' which has 5000 rows. Currently, dtype of column is object. Mostly it has string values but some values dtype is not string, I want to find all those rows and modify those rows. Column is as following:
1 abc
2 def
3 ghi
4 23
5…

Kiran
- 195
- 1
- 1
- 5
8
votes
2 answers
visualize a horizontal box plot in R
I have a dataset like this. The data has been collected through a questionnaire and I am going to do some exploratory data analysis.
windows <- c("yes", "no","yes","yes","no")
sql <- c("no","yes","no","no","no")
excel <-…

Hamideh
- 940
- 2
- 12
- 22
8
votes
2 answers
Text similarity with sentence embeddings
I'm trying to calculate similarity between texts with various lengths. My current approach is following:
Using Universal Sentence Encoder, I convert text to a set of vectors.
I average these vectors to create the final feature vector.
I compare…

Kertis van Kertis
- 133
- 1
- 6
8
votes
4 answers
How to learn spam email detection?
I want to learn how a spam email detector is done. I'm not trying to build a commercial product, it'll be a serious learning exercise for me. Therefore, I'm looking for resources, such as existing projects, source code, articles, papers etc that I…

SmallChess
- 3,540
- 2
- 18
- 30
8
votes
2 answers
Time-series prediction: Model & data assumptions in AI/ML models vs conventional models
I was wondering if there was a good paper out there that informs about model and data assumptions in AI/ML approaches.
For example, if you look at Time Series Modelling (Estimation or Prediction) with Linear models or (G)ARCH/ARMA processes, there…

Maeaex1
- 550
- 2
- 15