Most Popular

1500 questions
8
votes
4 answers

Why is there a difference between predicting on Validation set and Test set?

I have a XGBoost model trying to predict if a currency will go up or down next period (5 min). I have a dataset from 2004 to 2018. I split the data randomized into 95% train and 5% validation and the accuracy on the Validation set is up to 55%. When…
DBSE
  • 221
  • 2
  • 3
8
votes
1 answer

Complex Chunking with NLTK

I am trying to figure out how to use NLTK's cascading chunker as per Chapter 7 of the NLTK book. Unfortunately, I'm running into a few issues when performing non-trivial chunking measures. Let's start with this phrase: "adventure movies between 2000…
grill
  • 234
  • 3
  • 7
8
votes
1 answer

Gensim LDA model: return keywords based on relevance (λ - lambda) value

I am using the gensim library for topic modeling, more specifically LDA. I created my corpus, my dictionary, and my LDA model. With the help of the pyLDAvis library I visualized the results. When I print the words with the highest probability on…
8
votes
1 answer

Which classification algorithms to try for classifying text data into 300 categories

I have 40000 rows of text data of health care domain. Data has one column for text (2-5 sentences) and one column for its category. I want to classify that into 300 categories. Some categories are independent while some are somewhat related.…
Alok Nayak
  • 191
  • 1
  • 5
8
votes
2 answers

How to use Graph Neural Network to predict relationships between nodes with pytorch_geometric?

Let's say I have a partly connected graph that represents members of many unrelated communities. I would like to predict the possible friendships between members of the same community: on an sliding scale between 0 to 10 how likey would they like…
Soerendip
  • 724
  • 1
  • 9
  • 16
8
votes
5 answers

What is the best question generation state of art with nlp?

I was trying out various projects available for question generation on GitHub namely NQG,question-generation and a lot of others but I don't see good results form them either they have very bad question formation or the questions generated are…
Jack109
  • 108
  • 1
  • 10
8
votes
2 answers

Why is taking the gradient of the average error in SGD not correct, but rather the average of the gradients of single errors?

I am a little confused about taking averages in cost functions and SGD. So far I always thought in SGD you would compute the average error for a batch and then backpropagate it. But then I was told in a comment on this question that that was wrong.…
lo tolmencre
  • 235
  • 1
  • 9
8
votes
2 answers

Which classification algorithms are negatively affected by class imbalances?

I've seen a few posts and papers floating around the web (mostly those related to over/undersampling, SMOTE, and cost-sensitive training) that, when discussing class imbalance, specify that certain algorithms are negatively impacted by class…
8
votes
4 answers

What is the term for when a model acts on the thing being modeled and thus changes the concept?

I'm trying to see if there is a conventional term for this concept to help me in my literature research and writing. When a machine learning model causes an action to be taken in the real world that affects future instances, what is that called? …
jsmith54
  • 83
  • 2
8
votes
1 answer

What are the input and output channels of a convolution in PyTorch?

From the documentation of Pytorch for Convolution, I saw the function torch.nn.Conv1d requires users to pass the parameters "in_channels" and "out_channels". I know they refer to input channels and output channels but I am not sure about what they…
LastK7
  • 101
  • 1
  • 1
  • 3
8
votes
4 answers

XGBoost Huge Dataset ~1TB

Can a gradient boosting solution like XGBoost or Lightbgm be used for a huge amount of data ? I have a csv file of 820GB containing 1 Billion observations and each observation has 650 datapoints. Is this too much data for XGBoost ? I have searched…
Medz Benz
  • 81
  • 1
  • 2
8
votes
3 answers

How to find out if two datasets are close to each other?

I have the following three datasets. data_a=[0.21,0.24,0.36,0.56,0.67,0.72,0.74,0.83,0.84,0.87,0.91,0.94,0.97] data_b=[0.13,0.21,0.27,0.34,0.36,0.45,0.49,0.65,0.66,0.90] data_c=[0.14,0.18,0.19,0.33,0.45,0.47,0.55,0.75,0.78,0.82] data_a is real data…
8
votes
1 answer

What makes binary cross entropy a better choice for binary classification than other loss functions?

I'm reading this post where I came across this quote "Cross-entropy is the default loss function to use for binary classification problems." But what about it makes it the default and presumably best loss function for binary classification?
John Slaine
  • 81
  • 1
  • 2
8
votes
3 answers

Why does logistic function use e rather than 2?

sigmoid function could be used as activation function in machine learning. $${\displaystyle S(x)={\frac {1}{1+e^{-x}}}={\frac {e^{x}}{e^{x}+1}}.}$$ If substitute e with 2, def sigmoid2(z): return 1/(1+2**(-z)) x = np.arange(-9,9,dtype=float) y…
JJJohn
  • 623
  • 10
  • 23
8
votes
2 answers

Why class weight is outperforming oversampling?

I am applying both class_weight and oversampling (SMOTE) techniques on a multiclass classification problem and getting better results when using the class_weight technique. Could someone please explain what could be the cause of this difference?
Sarah
  • 611
  • 2
  • 5
  • 17