Most Popular

1500 questions
5
votes
1 answer

Is the LSTM component a neuron or a layer?

Given the standard illustrative feed-forward neural net model, with the dots as neurons and the lines as neuron-to-neuron connection, what part is the (unfold) LSTM cell (see picture)? Is it a neuron (a dot) or a layer?
MScott
  • 445
  • 4
  • 13
5
votes
1 answer

How powerful is OpenAI's Gym and Universe in board games area?

I'm a big fan of computer board games and would like to make Python chess/go/shogi/mancala programs. Having heard of reinforcement learning, I decided to look at OpenAI Gym. But first of all, I would like to know, is it possible using OpenAI…
Taissa
  • 63
  • 4
5
votes
2 answers

What is "Computational Linguistics"?

It's not clear to me whether or not someone whose work aims to improve an NLP system may be called a "Computational Linguist" even when she/he doesn't modify the algorithm directly by coding. Let's consider the following activities: Annotation for…
franz1
  • 173
  • 4
5
votes
2 answers

What are examples of approaches to dimensionality reduction of feature vectors?

Given a pre-trained CNN model, I extract feature vector of images in reference and query dataset with several thousands of elements. I would like to apply some augmentation techniques to reduce the feature vector dimension to speed up cosine…
doplano
  • 299
  • 3
  • 10
5
votes
1 answer

Which deep learning models are suitable for image-to-image mapping?

I am working on a problem in which I need to train a neural network to map one or more input images to one or more output images (1 channel for image). Below I report some examples of input&output. In this case I report 1 input and 1 output image,…
5
votes
1 answer

Autoencoder produces repeated artifacts after convergence

As experiment, I have tried using an autoencoder to encode height data from the alps, however the decoded image is very pixellated after training for several hours as show in the image below. This repeating patter is larger than the final kernel…
Yadeses
  • 231
  • 2
  • 5
5
votes
1 answer

Why is a softmax used rather than dividing each activation by the sum?

Just wondering why a softmax is typically used in practice on outputs of most neural nets rather than just summing the activations and dividing each activation by the sum. I know it's roughly the same thing but what is the mathematical reasoning…
user8714896
  • 797
  • 1
  • 6
  • 24
5
votes
1 answer

Why do we average gradients and not loss in distributed training?

I'm running some distributed trainings in Tensorflow with Horovod. It runs training separately on multiple workers, each of which uses the same weights and does forward pass on unique data. Computed gradients are averaged within the communicator…
pSoLT
  • 161
  • 2
5
votes
1 answer

Is running more epochs really a direct cause of overfitting?

I've seen some comments in online articles/tutorials or Stack Overflow questions which suggest that increasing the number of epochs can result in overfitting. But my intuition tells me that there should be no direct relationship at all between the…
Alexander Soare
  • 1,339
  • 2
  • 11
  • 27
5
votes
1 answer

What is a "batch" in batch normalization?

I'm working on an example of CNN with the MNIST hand-written numbers dataset. Currently I've got convolution -> pool -> dense -> dense, and for the optimiser I'm using Mini-Batch Gradient Descent with a batch size of 32. Now this concept of batch…
Alexander Soare
  • 1,339
  • 2
  • 11
  • 27
5
votes
0 answers

Training and inference for highly-context-sensitive information

What is the best way to train / do inference when the context matters highly as to what the inferred result should be? For example in the image below all people are standing upright, but because of the perspective of the camera, their location…
g491
  • 101
  • 2
5
votes
1 answer

Are neurons in layer $l$ only affected by neurons in the previous layer?

Are artificial neurons in layer $l$ only affected by those in layer $l-1$ (providing inputs) or are they also affected by neurons in layer $l$ (and maybe by neurons in other layers)?
George White
  • 194
  • 1
  • 9
5
votes
1 answer

How can we prove that an autoassociator network will continue to perform if we zero the diagonal elements of a weight matrix?

How can we prove that an auto-associator network will continue to perform if we zero the diagonal elements of a weight matrix that has been determined by the Hebb rule? In other words, suppose that the weight matrix is determined from $W = PP^T-…
estamos
  • 157
  • 1
  • 12
5
votes
1 answer

When exactly is a model considered over-parameterized?

When exactly is a model considered over-parameterized? There are some recent researches in Deep Learning about the role of over-parameterization toward generalization, so it would be nice if I can know what exactly can be considered as such. A…
Phúc Lê
  • 161
  • 5
5
votes
1 answer

What is Statistical relational learning?

I have gone through the wikipedia explanation of SRL. But, it only confused me more: Statistical relational learning (SRL) is a subdiscipline of artificial intelligence and machine learning that is concerned with domain models that exhibit both…
Dawny33
  • 1,371
  • 13
  • 29