Cost tends to infinity when relu activation is used

Question

I have implemented a neural network with 1 hidden layer using sigmoid activation unit but after watching a video on how relu activation function can be much faster a tried implementing it but the cost function is either Nan or inf in python. I have found that many people have the same problem but I couldn't find any solution on the internet. I have added '#changed' comment to the lines that I have changed from the previous version in which I used sigmoid activation function. Related python code snippets:

def relu(arg):  #I have tried both relu and leaky relu
    return 1*(arg<0)*0.0001*arg + (arg>=0)*arg

def reluGrad(arg):
    for i in range(arg.shape[0]):
        for j in range(arg.shape[1]):
            if arg[i][j]>0:
                arg[i][j]=1
            else:
                arg[i][j]=0
    return arg
def softmax(x):
    x = x.transpose()
    e_x = np.exp(x - np.max(x))
    return (e_x / e_x.sum(axis=0)).transpose()

#forward prop:
a1 = np.insert(data,0,np.ones(len(data)),1).astype(np.float64)
    z2 = a1.dot(theta1)
    a2 = relu(z2) #changed
    a2 = np.insert(a2,0,np.ones(len(a2)),1)
    z3 = a2.dot(theta2)
    a3 = softmax(z3) #changed
#compute the cost:
    cost = -(output*(np.log(a3))+(1-output)*(np.log(1-a3))).sum()
    cost = (1/len(data))*cost + (lamb/(2*len(data)))*((np.delete(theta1,0,0)**2).sum() + (np.delete(theta2,0,0)**2).sum())

backProp:
    sigma3 = a3-output
    sigma2 = (sigma3.dot(np.transpose(theta2)))* reluGrad(np.insert(z2,0,np.ones(len(z2)),1)) #changed
    sigma2 = np.delete(sigma2,0,1)
    delta2 = (np.transpose(a2)).dot(sigma3)
    delta1 = (np.transpose(a1)).dot(sigma2)

    grad1 = delta1/len(data) + (lamb/len(data))*np.insert(np.delete(theta1,0,0),0,np.zeros(len(theta1[0])),0)
    grad2 = delta2/len(data) + (lamb/len(data))*np.insert(np.delete(theta2,0,0),0,np.zeros(len(theta2[0])),0)

#update theta
    theta1 = theta1 - alpha*grad1
    theta2 = theta2 - alpha*grad2

What is causing this problem ? How can it be fixed ?

score 1 · Answer 1 · answered Jul 03 '17 at 22:03

1

If a3 is 0 or 1, np.log(a3) or np.log(1-a3) is going to give an error as $\log(0)$ is undefined, and $\lim_{x -> 0} \log(x) = -\infty$

answered Jul 03 '17 at 22:03

geometrikal

533
1
5
14

How can I fix this ? – Saksham Jul 04 '17 at 01:15
@AyushChaurasia I'm not even sure that is the problem, can you print out the log values to verify? Also see here: https://datascience.stackexchange.com/questions/9302/the-cross-entropy-error-function-in-neural-networks for the suggestion to add a small value $10^{-15}$ in the log function so that it never reaches zero. – geometrikal Jul 04 '17 at 06:32
it removed the error but now I am only getting 45-50% accuracy – Saksham Jul 04 '17 at 08:59
Training or validation error? If training, maybe your network is not big enough or needs more layers. – geometrikal Jul 04 '17 at 19:44
@geometrkal i was talking about validation error. Every thing except the activation function is changed but accuracy has decreased dramatically . – Saksham Jul 06 '17 at 09:02

Cost tends to infinity when relu activation is used

1 Answers1