Most Popular
1500 questions
8
votes
1 answer
Encoding with OrdinalEncoder : how to give levels as user input?
I am trying to do ordinal encoding using:
from sklearn.preprocessing import OrdinalEncoder
I will try to explain my problem with a simple dataset.
X = pd.DataFrame({'animals':['low','med','low','high','low','high']})
enc =…

Ayush Ranjan
- 401
- 1
- 4
- 14
8
votes
2 answers
Joining tables from different locations in Bigquery
I have been trying to join two tables from different datasets that are in different locations but in the same project. However, I keep getting the error:
dataset not found in US location.
The datasets' locations are US and us-east1
Here is what I…

shivanshu dhawan
- 178
- 1
- 2
- 9
8
votes
3 answers
Difference between Ridge and Linear Regression
From what I have understood, the Ridge Regression is just having the loss function for an optimization problem with the addition of the regularization term (L2 Norm in the case of Ridge). However I am not sure if the loss function can be described…

Panathinaikos
- 197
- 1
- 1
- 8
8
votes
2 answers
What should be the labels for subword tokens in BERT for NER task?
For any NER task, we need a sequence of words and their corresponding labels.
To extract features for these words from BERT, they need to be tokenized into subwords.
For example, the word 'infrequent' (with label B-count) will be tokenized into…

PinkBanter
- 374
- 3
- 15
8
votes
2 answers
Why does vanilla transformer has fixed-length input?
I know that in the math on which the transformer is based there is no restriction on the length of input. But I still can’t understand why we should fix it in the frameworks (PyTorch). Because of this problem Transformer-XL has been created.
Can you…

Ann
- 133
- 7
8
votes
1 answer
Which of the NIPS 2014 papers are most significant, and why?
As a newcomer to the field, I find many of the NIPS 2014 papers fascinating, but it is difficult for me to evaluate which ones represent real progress over current approaches.
Which papers do you think are most significant and are likely to have a…

Michael R. Bernstein
- 189
- 2
8
votes
4 answers
How are Q, K, and V Vectors Trained in a Transformer Self-Attention?
I am new to transformers, so this may be a silly question, but I was reading about transformers and how they use attention, and it involves the usage of three special vectors. Most articles say that one will understand their purpose after reading…

arctic_hen7
- 181
- 1
- 2
8
votes
2 answers
What are some standard ways of computing the distance between individual search queries?
I made a similar question asking about distance between "documents" (Wikipedia articles, news stories, etc.). I made this a separate question because search queries are considerably smaller than documents and are considerably noisier. I hence…

Matt
- 821
- 1
- 7
- 12
8
votes
1 answer
Why gradient boosting uses sampling without replacement?
In Random Forest each tree is built selecting a sample with replacement (bootstrap). And I assumed that Gradient Boosting's trees were selected with the same sampling technique. (@BenReiniger corrected me). Here there are the sampling techniques…

Carlos Mougan
- 6,252
- 2
- 18
- 48
8
votes
2 answers
How word2vec can handle unseen / new words to bypass this for new classifications?
In simple terms, if my classification is based on word2vec as features, what I am supposed to do, if a new word comes, which does not have a word2vec?
I am trying to used word2vec or word vectors for classification based on entity.
For example:
I…

Sarath
- 81
- 1
- 2
8
votes
2 answers
Linearly increasing data with manual reset
I have a linearly increasing time series dataset of a sensor, with value ranges between 50 and 150. I've implemented a Simple Linear Regression algorithm to fit a regression line on such data, and I'm predicting the date when the series would reach…

ArunDhaJ
- 183
- 6
8
votes
1 answer
sklearn SimpleImputer too slow for categorical data represented as string values
I have a data set with categorical features represented as string values and I want to fill-in missing values in it. I’ve tried to use sklearn’s SimpleImputer but it takes too much time to fulfill the task as compared to pandas. Both methods produce…

vlc146543
- 83
- 1
- 4
8
votes
1 answer
TensorFlow / Keras: What is stateful = True in LSTM layers?
Could you elaborate on this argument? I found the brief explanation from the docs unsatisfying:
stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i…

Leevo
- 6,225
- 3
- 16
- 52
8
votes
2 answers
NLP : variations of a text without modifying it's meaning
I am currently working on the automation of recurring reports (weekly 30-50 pages reports for around 100 districts). Those reports have a mostly fixed form : maps, graphs, data tables and small zone of text.
Apart for some discussion around colors…

Lucas Morin
- 2,196
- 5
- 21
- 42
8
votes
3 answers
Pivoting a two-column feature table in Pandas
How can I transform the following DataFrame into one with cities as rows and each cuisine as a column, and 1 or 0 as values (1 if the city has that kind of cuisine)?
I think this turns out to be a very common problem in transforming data into…

blue-dino
- 383
- 2
- 3
- 11