Highest Voted Questions - Data Science Stack Exchange

46

votes

5 answers

How to force weights to be non-negative in Linear regression

I am using a standard linear regression using scikit-learn in python. However, I would like to force the weights to be all non-negative for every feature. is there any way I can accomplish that? I was looking in the documentation but could not find…

asked Apr 11 '17 at 03:02

user

1,993
6
21
38

44

votes

6 answers

What is the relationship between the accuracy and the loss in deep learning?

I have created three different models using deep learning for multi-class classification and each model gave me a different accuracy and loss value. The results of the testing model as the following: First Model: Accuracy: 98.1% Loss: 0.1882 Second…

asked Dec 14 '18 at 09:08

N.IT

1,995
4
19
35

44

votes

6 answers

How can I transform names in a confidential data set to make it anonymous, but preserve some of the characteristics of the names?

Motivation I work with datasets that contain personally identifiable information (PII) and sometimes need to share part of a dataset with third parties, in a way that doesn't expose PII and subject my employer to liability. Our usual approach here…

asked Jun 16 '14 at 19:48

Air

822
9
20

44

votes

2 answers

Should we apply normalization to test data as well?

I am doing a project on an author identification problem. I applied the tf-idf normalization to train data and then trained an SVM on that data. Now when using the classifier, should I normalize test data as well. I feel that the basic aim of…

asked Feb 08 '18 at 16:53

Kishan Kumar

665
2
7
11

44

votes

4 answers

Multi GPU in Keras

How we can program in the Keras library (or TensorFlow) to partition training on multiple GPUs? Let's say that you are in an Amazon ec2 instance that has 8 GPUs and you would like to use all of them to train faster, but your code is just for a…

asked Oct 18 '17 at 20:30

Hector Blandin

579
1
7
11

44

votes

9 answers

Why are Machine Learning models called black boxes?

I was reading this blog post titled: The Financial World Wants to Open AI’s Black Boxes, where the author repeatedly refer to ML models as "black boxes". A similar terminology has been used at several places when referring to ML models. Why is it…

asked Aug 17 '17 at 11:53

Dawny33

8,296
12
48
104

44

votes

2 answers

LightGBM vs XGBoost

I'm trying to understand which is better (more accurate, especially in classification problems) I've been searching articles comparing LightGBM and XGBoost but found only…

xgboost

asked May 11 '17 at 12:12

Sergey Nizhevyasov

553
1
4
4

44

votes

6 answers

Encoding features like month and hour as categorial or numeric?

Is it better to encode features like month and hour as factor or numeric in a machine learning model? On the one hand, I feel numeric encoding might be reasonable, because time is a forward progressing process (the fifth month is followed by the…

asked Mar 22 '17 at 07:43

Funkwecker

595
1
5
13

44

votes

5 answers

Intuitive explanation of Noise Contrastive Estimation (NCE) loss?

I read about NCE (a form of candidate sampling) from these two sources: Tensorflow writeup Original Paper Can someone help me with the following: A simple explanation of how NCE works (I found the above difficult to parse and get an understanding…

asked Aug 05 '16 at 03:36

tejaskhot

4,065
7
20
18

43

votes

10 answers

Do data scientists use Excel?

I would consider myself a journeyman data scientist. Like most (I think), I made my first charts and did my first aggregations in high school and college, using Excel. As I went through college, grad school and ~7 years of work experience, I…

asked Apr 03 '15 at 20:28

JHowIX

533
1
4
6

43

votes

2 answers

What is GELU activation?

I was going through BERT paper which uses GELU (Gaussian Error Linear Unit) which states equation as $$ GELU(x) = xP(X ≤ x) = xΦ(x).$$ which in turn is approximated to $$0.5x(1 + tanh[\sqrt{ 2/π}(x + 0.044715x^3)])$$ Could you simplify the equation…

asked Apr 18 '19 at 08:06

thanatoz

2,405
4
16
39

43

votes

13 answers

Data science related funny quotes

It has been customary for the users of different communities to quote funny things about their fields. It may be fun to share your funny things about Machine Learning, Deep Learning, Data Science and the things that you face every day!

asked Dec 14 '18 at 14:37

Green Falcon

14,058
9
57
98

43

votes

6 answers

When would one use Manhattan distance as opposed to Euclidean distance?

I am trying to look for a good argument on why one would use the Manhattan distance over the Euclidean distance in machine learning. The closest thing I found to a good argument so far is on this MIT lecture. At 36:15 you can see on the slides the…

asked Jun 30 '17 at 06:28

Bitcoin Cash - ADA enthusiast

609
1
7
12

42

votes

5 answers

In the context of Deep Learning, what is training warmup steps

I found the term "training warmup steps" in some of the papers. What exactly does this term mean? Has it got anything to do with "learning rate"? If so, how does it affect it?

asked Jul 19 '19 at 10:10

Ashwin Geet D'Sa

1,129
2
9
20

42

votes

10 answers

Can machine learning algorithms predict sports scores or plays?

I have a variety of NFL datasets that I think might make a good side-project, but I haven't done anything with them just yet. Coming to this site made me think of machine learning algorithms and I wondering how good they might be at either…

asked Jun 10 '14 at 10:58

Steve Kallestad

3,128
4
21
39

Most Popular