Highest Voted Questions - Data Science Stack Exchange

87

votes

6 answers

strings as features in decision tree/random forest

I am doing some problems on an application of decision tree/random forest. I am trying to fit a problem which has numbers as well as strings (such as country name) as features. Now the library, scikit-learn takes only numbers as parameters, but I…

asked Feb 25 '15 at 01:07

user3001408

1,005
1
10
8

87

votes

4 answers

How are 1x1 convolutions the same as a fully connected layer?

I recently read Yan LeCuns comment on 1x1 convolutions: In Convolutional Nets, there is no such thing as "fully-connected layers". There are only convolution layers with 1x1 convolution kernels and a full connection table. It's a…

asked Jul 17 '16 at 13:23

Martin Thoma

18,880
35
95
169

86

votes

8 answers

Time series prediction using ARIMA vs LSTM

The problem that I am dealing with is predicting time series values. I am looking at one time series at a time and based on for example 15% of the input data, I would like to predict its future values. So far I have come across two models: LSTM…

asked Jul 11 '16 at 16:45

ahajib

1,075
1
9
15

84

votes

6 answers

Cosine similarity versus dot product as distance metrics

It looks like the cosine similarity of two features is just their dot product scaled by the product of their magnitudes. When does cosine similarity make a better distance metric than the dot product? I.e. do the dot product and cosine similarity…

classification

asked Jul 15 '14 at 21:30

ahoffer

943
1
7
7

83

votes

5 answers

GBM vs XGBOOST? Key differences?

I am trying to understand the key differences between GBM and XGBOOST. I tried to google it, but could not find any good answers explaining the differences between the two algorithms and why xgboost almost always performs better than GBM. What makes…

asked Feb 11 '17 at 20:03

Aman

997
1
8
8

83

votes

5 answers

What is the difference between "equivariant to translation" and "invariant to translation"

I'm having trouble understanding the difference between equivariant to translation and invariant to translation. In the book Deep Learning. MIT Press, 2016 (I. Goodfellow, A. Courville, and Y. Bengio), one can find on the convolutional…

asked Jan 04 '17 at 08:41

Aamir

993
1
7
6

81

votes

9 answers

Data scientist vs machine learning engineer

What are the differences, if any, between a "data scientist" and a "machine learning engineer"? Over the past year or so "machine learning engineer" has started to show up a lot in job postings. This is particularly noticeable in San Francisco,…

machine-learning

asked Feb 20 '18 at 06:15

Ryan Zotti

4,149
3
19
32

78

votes

6 answers

What is the difference between Gradient Descent and Stochastic Gradient Descent?

What is the difference between Gradient Descent and Stochastic Gradient Descent? I am not very familiar with these, can you describe the difference with a short example?

asked Aug 04 '18 at 06:36

Developer

1,099
2
9
11

77

votes

4 answers

Convert a list of lists into a Pandas Dataframe

I am trying to convert a list of lists which looks like the following into a Pandas Dataframe [['New York Yankees ', '"Acevedo Juan" ', 900000, ' Pitcher\n'], ['New York Yankees ', '"Anderson Jason"', 300000, ' Pitcher\n'], ['New York Yankees ',…

pandas

asked Jan 05 '18 at 18:40

Aravind Veluchamy

871
1
6
3

77

votes

10 answers

How to clone Python working environment on another machine?

I developed a machine learning model with Python (Anaconda + Flask) on my workstation and all goes well. Later, I tried to ship this program onto another machine where of course I tried to set up the same environment, but the program fails to run. I…

asked Oct 26 '17 at 12:36

Hendrik

8,587
17
42
55

75

votes

6 answers

Cross-entropy loss explanation

Suppose I build a neural network for classification. The last layer is a dense layer with Softmax activation. I have five different classes to classify. Suppose for a single training example, the true label is [1 0 0 0 0] while the predictions be…

asked Jul 10 '17 at 10:26

enterML

3,031
9
27
38

73

votes

7 answers

Open source Anomaly Detection in Python

Problem Background: I am working on a project that involves log files similar to those found in the IT monitoring space (to my best understanding of IT space). These log files are time-series data, organized into hundreds/thousands of rows of…

asked Jul 22 '15 at 14:26

ximiki

933
1
7
15

71

votes

2 answers

Are Support Vector Machines still considered "state of the art" in their niche?

This question is in response to a comment I saw on another question. The comment was regarding the Machine Learning course syllabus on Coursera, and along the lines of "SVMs are not used so much nowadays". I have only just finished the relevant…

asked Jul 09 '14 at 12:22

Neil Slater

28,918
4
80
100

71

votes

4 answers

What is purpose of the [CLS] token and why is its encoding output important?

I am reading this article on how to use BERT by Jay Alammar and I understand things up until: For sentence classification, we’re only only interested in BERT’s output for the [CLS] token, so we select that slice of the cube and discard everything…

asked Jan 09 '20 at 17:20

user3768495

927
1
7
8

70

votes

11 answers

What is dimensionality reduction? What is the difference between feature selection and extraction?

From wikipedia: dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction. What is the difference between feature…

asked May 18 '14 at 06:26

alvas

2,410
7
25
40

Most Popular