Most Popular

1500 questions
87
votes
6 answers

strings as features in decision tree/random forest

I am doing some problems on an application of decision tree/random forest. I am trying to fit a problem which has numbers as well as strings (such as country name) as features. Now the library, scikit-learn takes only numbers as parameters, but I…
user3001408
  • 1,005
  • 1
  • 10
  • 8
87
votes
4 answers

How are 1x1 convolutions the same as a fully connected layer?

I recently read Yan LeCuns comment on 1x1 convolutions: In Convolutional Nets, there is no such thing as "fully-connected layers". There are only convolution layers with 1x1 convolution kernels and a full connection table. It's a…
Martin Thoma
  • 18,880
  • 35
  • 95
  • 169
86
votes
8 answers

Time series prediction using ARIMA vs LSTM

The problem that I am dealing with is predicting time series values. I am looking at one time series at a time and based on for example 15% of the input data, I would like to predict its future values. So far I have come across two models: LSTM…
ahajib
  • 1,075
  • 1
  • 9
  • 15
84
votes
6 answers

Cosine similarity versus dot product as distance metrics

It looks like the cosine similarity of two features is just their dot product scaled by the product of their magnitudes. When does cosine similarity make a better distance metric than the dot product? I.e. do the dot product and cosine similarity…
ahoffer
  • 943
  • 1
  • 7
  • 7
83
votes
5 answers

GBM vs XGBOOST? Key differences?

I am trying to understand the key differences between GBM and XGBOOST. I tried to google it, but could not find any good answers explaining the differences between the two algorithms and why xgboost almost always performs better than GBM. What makes…
Aman
  • 997
  • 1
  • 8
  • 8
83
votes
5 answers

What is the difference between "equivariant to translation" and "invariant to translation"

I'm having trouble understanding the difference between equivariant to translation and invariant to translation. In the book Deep Learning. MIT Press, 2016 (I. Goodfellow, A. Courville, and Y. Bengio), one can find on the convolutional…
Aamir
  • 993
  • 1
  • 7
  • 6
81
votes
9 answers

Data scientist vs machine learning engineer

What are the differences, if any, between a "data scientist" and a "machine learning engineer"? Over the past year or so "machine learning engineer" has started to show up a lot in job postings. This is particularly noticeable in San Francisco,…
Ryan Zotti
  • 4,149
  • 3
  • 19
  • 32
78
votes
6 answers

What is the difference between Gradient Descent and Stochastic Gradient Descent?

What is the difference between Gradient Descent and Stochastic Gradient Descent? I am not very familiar with these, can you describe the difference with a short example?
Developer
  • 1,099
  • 2
  • 9
  • 11
77
votes
4 answers

Convert a list of lists into a Pandas Dataframe

I am trying to convert a list of lists which looks like the following into a Pandas Dataframe [['New York Yankees ', '"Acevedo Juan" ', 900000, ' Pitcher\n'], ['New York Yankees ', '"Anderson Jason"', 300000, ' Pitcher\n'], ['New York Yankees ',…
Aravind Veluchamy
  • 871
  • 1
  • 6
  • 3
77
votes
10 answers

How to clone Python working environment on another machine?

I developed a machine learning model with Python (Anaconda + Flask) on my workstation and all goes well. Later, I tried to ship this program onto another machine where of course I tried to set up the same environment, but the program fails to run. I…
Hendrik
  • 8,587
  • 17
  • 42
  • 55
75
votes
6 answers

Cross-entropy loss explanation

Suppose I build a neural network for classification. The last layer is a dense layer with Softmax activation. I have five different classes to classify. Suppose for a single training example, the true label is [1 0 0 0 0] while the predictions be…
enterML
  • 3,031
  • 9
  • 27
  • 38
73
votes
7 answers

Open source Anomaly Detection in Python

Problem Background: I am working on a project that involves log files similar to those found in the IT monitoring space (to my best understanding of IT space). These log files are time-series data, organized into hundreds/thousands of rows of…
ximiki
  • 933
  • 1
  • 7
  • 15
71
votes
2 answers

Are Support Vector Machines still considered "state of the art" in their niche?

This question is in response to a comment I saw on another question. The comment was regarding the Machine Learning course syllabus on Coursera, and along the lines of "SVMs are not used so much nowadays". I have only just finished the relevant…
Neil Slater
  • 28,918
  • 4
  • 80
  • 100
71
votes
4 answers

What is purpose of the [CLS] token and why is its encoding output important?

I am reading this article on how to use BERT by Jay Alammar and I understand things up until: For sentence classification, we’re only only interested in BERT’s output for the [CLS] token, so we select that slice of the cube and discard everything…
user3768495
  • 927
  • 1
  • 7
  • 8
70
votes
11 answers

What is dimensionality reduction? What is the difference between feature selection and extraction?

From wikipedia: dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction. What is the difference between feature…
alvas
  • 2,410
  • 7
  • 25
  • 40