Most Popular
1500 questions
8
votes
3 answers
Why is deep learning used in recommender systems?
I am currently reading a lot about recommender systems (RS) and came across that many RS are based on deep learning.
However, I never find a good scientific article why deep learning is used in RS and why it is more successful compared to other…

Ella
- 179
- 1
8
votes
1 answer
R error using package tm (text-mining)
I am attempting to use the tm package to convert a vector of text strings to a corpus element.
My code looks something like this
Corpus(d1$Yes)
where d1$Yes is a factor with 124 levels, each containing a text string.
For example, d1$Yes[246] = "So…

Ivoire
- 89
- 1
- 3
8
votes
5 answers
How do I encode the categorical columns if there are more than 15 unique values?
I'm trying to use this data to make a data analysis report using regression. Since regression only allows for numerical types, I then need to encode the categorical data. However, most of these have more than 15 unique values such as country.
Do I…

Cinemato
- 81
- 1
- 2
8
votes
3 answers
Which algorithms or methods can be used to detect an outlier from this data set?
Suppose I have a data set : Amount of money (100, 50, 150, 200, 35, 60 ,50, 20, 500). I have Googled the web looking for techniques that can be used to find a possible outlier in this data set but I ended up confused.
My question is: Which…

CN1002
- 243
- 2
- 7
8
votes
4 answers
Data science and MapReduce programming model of Hadoop
What are the different classes of data science problems that can be solved using mapreduce programming model?

10land
- 369
- 3
- 10
8
votes
1 answer
MLflow real world experience
Can someone provide a summary of the real world deployment experience of MLflow? We have a few ML models (e.g., LightGBM, tensorflow v2, etc.) and want to avoid framework like SageMaker (due to customer requirement). So we are looking into various…

David293836
- 197
- 1
- 6
8
votes
1 answer
Who invented the concept of over-fitting?
I list the references that I found so far.
Shortly, the first appearance of the term was in 1670, first appearance in in close meaning was in 1827, first appearance in a biological paper was in 1923 and first appearance in statistics was in…

DaL
- 2,633
- 12
- 13
8
votes
2 answers
What is the difference between GPT blocks and Transformer Decoder blocks?
I know GPT is a Transformer-based Neural Network, composed of several blocks. These blocks are based on the original Transformer's Decoder blocks, but are they exactly the same?
In the original Transformer model, Decoder blocks have two attention…

Leevo
- 6,225
- 3
- 16
- 52
8
votes
1 answer
Cosine Distance > 1 in scipy
I am working on a recommendation engine, and I have chosen to use SciPy's cosine distance as a way of comparing items.
I have two vectors:
a = [2.7654870801855078, 0.35995355443076027, 0.016221679989074141, -0.012664358453398751,…

redgem
- 183
- 1
- 1
- 4
8
votes
5 answers
Filling missing data with other than mean values
What are all the options available for filling in missing data?
One obvious choice is the mean, but if the percentage of missing data is large, it will decrease the accuracy.
So how do we deal with missing values if they are are lot of them?

mach
- 367
- 1
- 4
- 9
8
votes
4 answers
One Hot encoding for large number of values
How do we use one hot encoding if the number of values which a categorical variable can take is large ?
In my case it is 56 values. So as per usual method I would have to add 56 columns (56 binary features) in the training dataset which will…

mach
- 367
- 1
- 4
- 9
8
votes
2 answers
Can learning algorithms take in data along with their uncertainty? (chaining ML algorithms along with errors)
How to chain statistical methods (estimators or classifiers) taking into account the uncertainty (error) of the previous step?
Ex: Consider a pipeline, where housing prices are estimated from census and geographical data and are fed into another…

duggi
- 131
- 4
8
votes
2 answers
Difference between training and test data distributions
A basic assumption in machine learning is that training and test data are drawn from the same population, and thus follow the same distribution. But, in practice, this is highly unlikely. Covariate shift addresses this issue. Can someone clear the…

Daniel Wonglee
- 191
- 1
- 4
8
votes
3 answers
feature importance after classification
I have time series data and more or less 200 features for each sample, I used a recurrent neural network for the binary classification task.
After the classification I would like to know which features contribute most to one of the target(let's say…

Rick0
- 105
- 4
8
votes
2 answers
How does word2vec handle the input word being in the context?
If word2vec encounters the same word multiple times in the same window, what occurs? Obviously it is meaningless to decrease the distance between the vectors for the input word and the target word. But will the repetition strengthen the…

jamesmf
- 3,097
- 1
- 17
- 25