Most Popular

1500 questions
8
votes
3 answers

Why is deep learning used in recommender systems?

I am currently reading a lot about recommender systems (RS) and came across that many RS are based on deep learning. However, I never find a good scientific article why deep learning is used in RS and why it is more successful compared to other…
Ella
  • 179
  • 1
8
votes
1 answer

R error using package tm (text-mining)

I am attempting to use the tm package to convert a vector of text strings to a corpus element. My code looks something like this Corpus(d1$Yes) where d1$Yes is a factor with 124 levels, each containing a text string. For example, d1$Yes[246] = "So…
Ivoire
  • 89
  • 1
  • 3
8
votes
5 answers

How do I encode the categorical columns if there are more than 15 unique values?

I'm trying to use this data to make a data analysis report using regression. Since regression only allows for numerical types, I then need to encode the categorical data. However, most of these have more than 15 unique values such as country. Do I…
Cinemato
  • 81
  • 1
  • 2
8
votes
3 answers

Which algorithms or methods can be used to detect an outlier from this data set?

Suppose I have a data set : Amount of money (100, 50, 150, 200, 35, 60 ,50, 20, 500). I have Googled the web looking for techniques that can be used to find a possible outlier in this data set but I ended up confused. My question is: Which…
CN1002
  • 243
  • 2
  • 7
8
votes
4 answers

Data science and MapReduce programming model of Hadoop

What are the different classes of data science problems that can be solved using mapreduce programming model?
10land
  • 369
  • 3
  • 10
8
votes
1 answer

MLflow real world experience

Can someone provide a summary of the real world deployment experience of MLflow? We have a few ML models (e.g., LightGBM, tensorflow v2, etc.) and want to avoid framework like SageMaker (due to customer requirement). So we are looking into various…
David293836
  • 197
  • 1
  • 6
8
votes
1 answer

Who invented the concept of over-fitting?

I list the references that I found so far. Shortly, the first appearance of the term was in 1670, first appearance in in close meaning was in 1827, first appearance in a biological paper was in 1923 and first appearance in statistics was in…
DaL
  • 2,633
  • 12
  • 13
8
votes
2 answers

What is the difference between GPT blocks and Transformer Decoder blocks?

I know GPT is a Transformer-based Neural Network, composed of several blocks. These blocks are based on the original Transformer's Decoder blocks, but are they exactly the same? In the original Transformer model, Decoder blocks have two attention…
Leevo
  • 6,225
  • 3
  • 16
  • 52
8
votes
1 answer

Cosine Distance > 1 in scipy

I am working on a recommendation engine, and I have chosen to use SciPy's cosine distance as a way of comparing items. I have two vectors: a = [2.7654870801855078, 0.35995355443076027, 0.016221679989074141, -0.012664358453398751,…
redgem
  • 183
  • 1
  • 1
  • 4
8
votes
5 answers

Filling missing data with other than mean values

What are all the options available for filling in missing data? One obvious choice is the mean, but if the percentage of missing data is large, it will decrease the accuracy. So how do we deal with missing values if they are are lot of them?
mach
  • 367
  • 1
  • 4
  • 9
8
votes
4 answers

One Hot encoding for large number of values

How do we use one hot encoding if the number of values which a categorical variable can take is large ? In my case it is 56 values. So as per usual method I would have to add 56 columns (56 binary features) in the training dataset which will…
mach
  • 367
  • 1
  • 4
  • 9
8
votes
2 answers

Can learning algorithms take in data along with their uncertainty? (chaining ML algorithms along with errors)

How to chain statistical methods (estimators or classifiers) taking into account the uncertainty (error) of the previous step? Ex: Consider a pipeline, where housing prices are estimated from census and geographical data and are fed into another…
duggi
  • 131
  • 4
8
votes
2 answers

Difference between training and test data distributions

A basic assumption in machine learning is that training and test data are drawn from the same population, and thus follow the same distribution. But, in practice, this is highly unlikely. Covariate shift addresses this issue. Can someone clear the…
Daniel Wonglee
  • 191
  • 1
  • 4
8
votes
3 answers

feature importance after classification

I have time series data and more or less 200 features for each sample, I used a recurrent neural network for the binary classification task. After the classification I would like to know which features contribute most to one of the target(let's say…
Rick0
  • 105
  • 4
8
votes
2 answers

How does word2vec handle the input word being in the context?

If word2vec encounters the same word multiple times in the same window, what occurs? Obviously it is meaningless to decrease the distance between the vectors for the input word and the target word. But will the repetition strengthen the…
jamesmf
  • 3,097
  • 1
  • 17
  • 25