Most Popular
1500 questions
8
votes
2 answers
How should I use BERT embeddings for clustering (as opposed to fine-tuning BERT model for a supervised task)
First of all, I want to say that I am asking this question because I am interested in using BERT embeddings as document features to do clustering. I am using Transformers from the Hugging Face library. I was thinking of averaging all of the Word…

fractalnature
- 805
- 6
- 19
8
votes
4 answers
Understanding how convolutional layers work
After working with a CNN using Keras and the Mnist dataset for the well-know hand written digit recognition problem, I came up with some questions about how the convolutional layer work. I can understand what the convolution process is.
My first…

Karampistis Dimitrios
- 93
- 1
- 4
8
votes
4 answers
Does reinforcement learning require the help of other learning algorithms?
Can't reinforcement learning be used without the help of other learning algorithms like SVM and MLP back propagation? I consulted two papers:
Paper 1
Paper 2
both have used other machine learning methods in the inner loop.

girl101
- 1,161
- 2
- 11
- 26
8
votes
3 answers
Are there any machine learning techniques to identify points on plots/ images?
I have data for each vehicle's lateral position over time and lane number as shown in these 3 plots in the image and sample data below.
> a
Frame.ID xcoord Lane
1 452 27.39400 3
2 453 27.38331 3
3 454 27.42999 3
4 …

umair durrani
- 344
- 2
- 8
8
votes
2 answers
Can a linear regression model without polynomial features overfit?
I've read in some articles on the internet that linear regression can overfit. However is that possible when we are not using polynomial features? We are just plotting a line trough the data points when we have one feature or a plane when we have…

Tim von Känel
- 361
- 1
- 10
8
votes
4 answers
Job title similarity
I'm trying to define a metric between job titles in IT field. For this I need some metric between words of job titles that are not appearing together in the same job title, e.g. metric between the words
senior, primary, lead, head, vp, director,…

Mher
- 181
- 5
8
votes
1 answer
Anybody know what this type of visualisation is called?
I think this is a pretty cool way to visualise changes in values but I can’t find any name for this type of visualisation.
I
Source: https://www.economist.com/graphic-detail/2020/07/28/americans-are-getting-more-nervous-about-what-they-say-in-public

K G
- 183
- 3
8
votes
3 answers
Should you use random state or random seed in machine learning models?
I'm starting to study machine learning. All the examples I saw, the person that created the ML model used a random state or a random seed to stop the randomness of the process. But, in real life, when you're trying to apply a machine learning model…

Caldass_
- 167
- 1
- 7
8
votes
3 answers
Modality of data
Can anyone please explain in clear words what is generally meant by "modality of data"?
I know what modality means with respect to distributions.

Julia
- 81
- 1
- 2
8
votes
3 answers
Bert-Transformer : Why Bert transformer uses [CLS] token for classification instead of average over all tokens?
I am doing experiments on bert architecture and found out that most of the fine-tuning task takes the final hidden layer as text representation and later they pass it to other models for the further downstream task.
Bert's last layer looks like this…

Aaditya ura
- 415
- 5
- 16
8
votes
2 answers
Do I need validation data if my train and test accuracy/loss is consistent?
I am trying to understand the purpose of a 3rd split in the form of a validation dataset. I am not necessarily talking about cross-validation here.
In the scenario below, it would appear that the model is overfit to the training dataset.
Train…

Kermit
- 529
- 5
- 17
8
votes
2 answers
Is over fitting okay if test accuracy is high enough?
I am trying to build a binary classifier. I have tried deep neural networks with various different structures and parameters and I was not able to get anything better than
Train set accuracy : 0.70102
Test set accuracy : 0.70001
Then I tried…

skrrrt
- 304
- 2
- 13
8
votes
2 answers
Why Scikit and statsmodel provide different Coefficient of determination?
First of all, I know there is a similar question, however, I didn't find it so much helpful.
My issue is concerning simple Linear regression and the outcome of R-Squared. I founded that results can be quite different if I use statsmodels and…

Luckasino
- 183
- 1
- 4
8
votes
1 answer
Which ML approach to choose for the game AI when rewards are delayed?
Question: Which Machine Learning approach should I choose for the AI of my computer game, where the actions of the AI do not lead to immediate rewards, but delayed rewards instead?
About me:
I am a complete beginner in the area of machine learning.…

Logende
- 61
- 4
8
votes
1 answer
Keras Early Stopping: Monitor 'loss' or 'val_loss'?
I often use "early stopping" when I train neural nets, e.g. in Keras:
from keras.callbacks import EarlyStopping
# Define early stopping as callback
early_stopping = EarlyStopping(monitor='loss', patience=5, mode='auto',…

Peter
- 7,446
- 5
- 19
- 49