When should I use a linear activation instead of ReLU?

Question

I have read this post: How to choose an activation function?.

There is enough literature about activation functions, but when should I use a linear activation instead of ReLU?

What does the author mean with ReLU when I'm dealing with positive values, and a linear function when I'm dealing with general values.?

Is there a more detail answer to this?

score 2 · Accepted Answer · answered Oct 21 '19 at 22:17

2

The activation function you choose depends on the application you are building/data that you have got to work with. It is hard to recommend one over the other, without taking this into account.

Here is a short-summary of the advantages and disadvantages of some common activation functions: https://missinglink.ai/guides/neural-network-concepts/7-types-neural-network-activation-functions-right/

What does the author mean with ReLU when I'm dealing with positive values, and a linear function when I'm dealing with general values.

ReLU is good for inputs > 0, since ReLU = 0 if input < 0(which would kill the neuron, if the gradient is = 0)

To remedy this, you could look into using a Leaky-ReLU instead. (Which avoids killing the neuron by returning a non-zero value in the cases of input <= 0)

answered Oct 21 '19 at 22:17

Krrrl

211
1
10

This solved a big part of my question, but would but would there be a case that i specifically would use a linear activation instead of relu? – jennifer ruurs Oct 22 '19 at 06:24
1

That depends again on your specific application.
There are two major drawbacks of linear activation functions:

1.You cant use back-propagation in training(since the derivative is a constant, it does not convey which weight influenced the input the most).

2.Linear activation functions are only applicable to shallow networks, since the derivative of a linear function is a constant(multiple layers of linear functions is just another linear function).

So, in the case that you have a shallow network - that does not rely on backpropagation - then you can use a linear activation function.
– Krrrl Oct 22 '19 at 13:38
1

I think it is fair to say that you can think of the linear activation function as an artifact of the earlier stages of neural network development - when work was done on single/few perceptrons, rather than larger layered networks. – Krrrl Oct 22 '19 at 13:39

score 1 · Answer 2 · edited Sep 25 '20 at 02:45

1

Nothing is written on stone in here, but as a rule of thumb linear activation is not very common. A linear activation function in a hidden layer can collapse more neurons in more layers. Linear activation can be implemented in the last layer if a scale of the outputs is not used. (This is the most common use I have seen.)

edited Sep 25 '20 at 02:45

Robby Goetschalckx

615
3
9

answered Sep 21 '20 at 18:08

Joaquin Torrens

42
6

When should I use a linear activation instead of ReLU?

2 Answers2