Questions tagged [deep-learning]

a new area of Machine Learning research concerned with the technologies used for learning hierarchical representations of data, mainly done with deep neural networks (i.e. networks with two or more hidden layers), but also with some sort of Probabilistic Graphical Models.

What is Deep Learning?

Deep Learning is an area of which attempts to build to learn complex functions by using special architectures composed of many layers (hence the term "deep").

Deep architectures allow more complex tasks to be learned because, in addition to these neural networks having more layers to perform transformations, the larger number of layers and more complex architectures of the neural network allow a hierarchical organization of functionality to emerge.

Deep Learning was introduced into machine learning research with the intention of moving machine learning closer to artificial intelligence. A significant impact of deep learning lies in feature learning, mitigating much of the effort going into manual feature engineering in non-deep learning neural networks.


New to Deep Learning?

There are a variety of resources including books, tutorials/workshops, etc. for those looking to learn more about Deep Learning.

A popular introductory tutorial is:

SciPy 2020 Conference Tutorial:

Some popular introductory books:


Resources

Papers

Books

Videos

Stack Exchange Sites

Other StackExchange sites with Deep Learning tag:

4871 questions
13
votes
2 answers

Sort numbers using only 2 hidden layers

I'm reading the cornerstone paper Sequence to Sequence Learning with Neural Networks by Ilya Sutskever and Quoc Le. On the first page, it briefly mentions that: A surprising example of the power of DNNs is their ability to sort N N-bit numbers…
aerin
  • 907
  • 1
  • 9
  • 13
13
votes
2 answers

deep learning for non-image non-NLP tasks?

So far there are many interesting applications for deep learning in computer vision or natural language processing. How is it in other more traditional fields? For example, I have traditional socio-demographic variables plus maybe a lot of lab…
spore234
  • 603
  • 8
  • 14
10
votes
2 answers

Relu does have 0 gradient by definition, then why gradient vanish is not a problem for x < 0?

By definition, Relu is max(0,f(x)). Then its gradient is defined as: 1 if x > 0 and 0 if x < 0. Wouldn't this mean the gradient is always 0 (vanishes) when x < 0? Then why do we say Relu doesn't suffer from the gradient vanish problem?
Edamame
  • 2,745
  • 5
  • 24
  • 33
9
votes
3 answers

How to use Cross Entropy loss in pytorch for binary prediction?

In the pytorch docs, it says for cross entropy loss: input has to be a Tensor of size (minibatch, C) Does this mean that for binary (0,1) prediction, the input must be converted into an (N,2) tensor where the second dimension is equal to (1-p)? So…
AAC
  • 509
  • 2
  • 5
  • 13
7
votes
1 answer

What is the relationship between "landmark Detection" and "landmark localization"

I am reading this paper "Grand Challenge of 106-Point Facial Landmark Localization" In the context of face recognition "Landmark Detection" is to detect a face by matching landmarks on a face. "Landmark Localization" is to predict the coordinates of…
whnlp
  • 171
  • 2
7
votes
1 answer

What is missing from the following Curriculum Learning implementation in a Deep Neural Net?

First of all we have a classification task. So we use the typical softmax cross entropy to classify. Current implementation of curriculum learning is as follows. First we train our best version of the neural net At the last epoch we get all of the…
7
votes
2 answers

Question about the simple example for batch normalization given in "deep learning" book

In the section about batch normalization of Deep Learning book by Ian Goodfellow (chapter link) there is the follwing text: As example, suppose we have a deep neural network that has only one unit per layerand does not use an activation function…
amit
  • 181
  • 3
6
votes
3 answers

Intuition behind the number of output neurons for a neural network

I am reading Michael Nielsen's book on deep learning. In the first chapter, he gives the classic example of classifying 10 handwritten digits, and uses it to explain the intuition behind choosing the number of output neurons. Initially, before…
6
votes
5 answers

Can we train a neural network to tell if an object is present or not in an Image?

I am new to machine learning, working on object detection, but not interested in the location of the object in the image, so I just want to know is it possible to train such a neural network, if yes, how? (I just want a list of objects present in…
Abstractgears
  • 61
  • 1
  • 5
5
votes
3 answers

Do I need to buy a NVIDIA graphic card to run deep learning algorithm?

I am new in deep learning. I am running a MacBook Pro yosemite (upgraded from Snowleopard). I don't have a CUDA-enabled card GPU, and running the code on the CPU is extremely slow. I heard that I can buy some instances on AWS, but it seems that they…
Lilianna
  • 153
  • 1
  • 3
4
votes
1 answer

what does smooth/soft probablity mean?

I was recently reading the Knowledge Distillation paper, and encountered the term smooth probabilities. The term was used to denote when the logits were divided a temperature. Neural networks typically produce class probabilities by using a …
Hossein
  • 565
  • 6
  • 14
3
votes
0 answers

why model's training is faster on windows than ubuntu?

I'm training a model of object detection with Tensorflow object detection API on windows 10 it looks around 3-4 times faster than ubuntu 18.04 and I don't know why I'm using same batch size, same PC and same dataset what could be the problem here…
3
votes
1 answer

What does this formula in Glorot & Bengio mean?

In this paper, on page 5, we find the formula $$Var(z^i)=Var(x)\prod_{i'=0}^{i-1}n_{i'}Var(W^{i'})$$ I am really struggling to understand what is meant by this formula. I think at least some of the following are true: We're dealing with a linear…
Jack M
  • 265
  • 1
  • 6
3
votes
1 answer

Why is IoU said to be non-differentiable?

I have been trying to find an answer online but I couldn't really find one. If anyone could help me I would appreciate it
StrickBan
  • 39
  • 1
3
votes
1 answer

Why do we want the variance of the layers to remain the same throughout a deep network?

I've been reading the literature on vanishing/exploding gradients and specifically how they connect to weight initialization. An idea I've come across a few times, which seems very important in this area, is that we want the variance to remain the…
Jack M
  • 265
  • 1
  • 6
1
2 3 4 5 6 7