Most Popular
1500 questions
8
votes
4 answers
Why is there a difference between predicting on Validation set and Test set?
I have a XGBoost model trying to predict if a currency will go up or down next period (5 min). I have a dataset from 2004 to 2018. I split the data randomized into 95% train and 5% validation and the accuracy on the Validation set is up to 55%. When…

DBSE
- 221
- 2
- 3
8
votes
1 answer
Complex Chunking with NLTK
I am trying to figure out how to use NLTK's cascading chunker as per Chapter 7 of the NLTK book. Unfortunately, I'm running into a few issues when performing non-trivial chunking measures.
Let's start with this phrase:
"adventure movies between 2000…

grill
- 234
- 3
- 7
8
votes
1 answer
Gensim LDA model: return keywords based on relevance (λ - lambda) value
I am using the gensim library for topic modeling, more specifically LDA. I created my corpus, my dictionary, and my LDA model. With the help of the pyLDAvis library I visualized the results. When I print the words with the highest probability on…

Tasos Lytos
- 81
- 4
8
votes
1 answer
Which classification algorithms to try for classifying text data into 300 categories
I have 40000 rows of text data of health care domain. Data has one column for text (2-5 sentences) and one column for its category.
I want to classify that into 300 categories. Some categories are independent while some are somewhat related.…

Alok Nayak
- 191
- 1
- 5
8
votes
2 answers
How to use Graph Neural Network to predict relationships between nodes with pytorch_geometric?
Let's say I have a partly connected graph that represents members of many unrelated communities. I would like to predict the possible friendships between members of the same community: on an sliding scale between 0 to 10 how likey would they like…

Soerendip
- 724
- 1
- 9
- 16
8
votes
5 answers
What is the best question generation state of art with nlp?
I was trying out various projects available for question generation on GitHub namely NQG,question-generation and a lot of others but I don't see good results form them either they have very bad question formation or the questions generated are…

Jack109
- 108
- 1
- 10
8
votes
2 answers
Why is taking the gradient of the average error in SGD not correct, but rather the average of the gradients of single errors?
I am a little confused about taking averages in cost functions and SGD. So far I always thought in SGD you would compute the average error for a batch and then backpropagate it. But then I was told in a comment on this question that that was wrong.…

lo tolmencre
- 235
- 1
- 9
8
votes
2 answers
Which classification algorithms are negatively affected by class imbalances?
I've seen a few posts and papers floating around the web (mostly those related to over/undersampling, SMOTE, and cost-sensitive training) that, when discussing class imbalance, specify that certain algorithms are negatively impacted by class…

Danny David Leybzon
- 180
- 2
8
votes
4 answers
What is the term for when a model acts on the thing being modeled and thus changes the concept?
I'm trying to see if there is a conventional term for this concept to help me in my literature research and writing. When a machine learning model causes an action to be taken in the real world that affects future instances, what is that called? …

jsmith54
- 83
- 2
8
votes
1 answer
What are the input and output channels of a convolution in PyTorch?
From the documentation of Pytorch for Convolution, I saw the function torch.nn.Conv1d requires users to pass the parameters "in_channels" and "out_channels". I know they refer to input channels and output channels but I am not sure about what they…

LastK7
- 101
- 1
- 1
- 3
8
votes
4 answers
XGBoost Huge Dataset ~1TB
Can a gradient boosting solution like XGBoost or Lightbgm be used for a huge amount of data ? I have a csv file of 820GB containing 1 Billion observations and each observation has 650 datapoints.
Is this too much data for XGBoost ? I have searched…

Medz Benz
- 81
- 1
- 2
8
votes
3 answers
How to find out if two datasets are close to each other?
I have the following three datasets.
data_a=[0.21,0.24,0.36,0.56,0.67,0.72,0.74,0.83,0.84,0.87,0.91,0.94,0.97]
data_b=[0.13,0.21,0.27,0.34,0.36,0.45,0.49,0.65,0.66,0.90]
data_c=[0.14,0.18,0.19,0.33,0.45,0.47,0.55,0.75,0.78,0.82]
data_a is real data…

Kartikeya Sharma
- 167
- 1
- 9
8
votes
1 answer
What makes binary cross entropy a better choice for binary classification than other loss functions?
I'm reading this
post where I came across this quote "Cross-entropy is the default loss function to use for binary classification problems."
But what about it makes it the default and presumably best loss function for binary classification?

John Slaine
- 81
- 1
- 2
8
votes
3 answers
Why does logistic function use e rather than 2?
sigmoid function could be used as activation function in machine learning.
$${\displaystyle S(x)={\frac {1}{1+e^{-x}}}={\frac {e^{x}}{e^{x}+1}}.}$$
If substitute e with 2,
def sigmoid2(z):
return 1/(1+2**(-z))
x = np.arange(-9,9,dtype=float)
y…

JJJohn
- 623
- 10
- 23
8
votes
2 answers
Why class weight is outperforming oversampling?
I am applying both class_weight and oversampling (SMOTE) techniques on a multiclass classification problem and getting better results when using the class_weight technique. Could someone please explain what could be the cause of this difference?

Sarah
- 611
- 2
- 5
- 17