Most Popular

1500 questions
22
votes
2 answers

How to define states in reinforcement learning?

I am studying reinforcement learning and the variants of it. I am starting to get an understanding of how the algorithms work and how they apply to an MDP. What I don't understand is the process of defining the states of the MDP. In most examples…
Andy
  • 323
  • 1
  • 2
  • 6
22
votes
1 answer

How does the (decoder-only) transformer architecture work?

How does the (decoder-only) transformer architecture work which is used in impressive models such as GPT-4?
Robin van Hoorn
  • 2,366
  • 1
  • 10
  • 33
22
votes
2 answers

Why would you implement the position-wise feed-forward network of the transformer with convolution layers?

The Transformer model introduced in "Attention is all you need" by Vaswani et al. incorporates a so-called position-wise feed-forward network (FFN): In addition to attention sub-layers, each of the layers in our encoder and decoder contains a…
Eli Korvigo
  • 321
  • 1
  • 2
  • 6
22
votes
1 answer

Has the Lovelace Test 2.0 been successfully used in an academic setting?

In October 2014, Dr. Mark Riedl published an approach to testing AI intelligence, called the "Lovelace Test 2.0", after being inspired by the original Lovelace Test (published in 2001). Mark believed that the original Lovelace Test would be…
Left SE On 10_6_19
  • 1,660
  • 9
  • 23
22
votes
3 answers

Why doesn't Q-learning converge when using function approximation?

The tabular Q-learning algorithm is guaranteed to find the optimal $Q$ function, $Q^*$, provided the following conditions (the Robbins-Monro conditions) regarding the learning rate are satisfied $\sum_{t} \alpha_t(s, a) = \infty$ $\sum_{t}…
nbro
  • 40,472
  • 12
  • 105
  • 192
21
votes
5 answers

Why does Batch Normalization work?

Adding BatchNorm layers improves training time and makes the whole deep model more stable. That's an experimental fact that is widely used in machine learning practice. My question is - why does it work? The original (2015) paper motivated the…
Kostya
  • 2,515
  • 10
  • 24
21
votes
3 answers

Is a dystopian surveillance state computationally possible?

This isn't really a conspiracy theory question. More of an inquire on the global computational power and data storage logistics question. Most recording instruments such as cameras and microphones are typically voluntary opt in devices, in that,…
Harrison Tran
  • 319
  • 2
  • 6
21
votes
2 answers

What is the difference between First-Visit Monte-Carlo and Every-Visit Monte-Carlo Policy Evaluation?

I came across these 2 algorithms, but I cannot understand the difference between these 2, both in terms of implementation as well as intuitionally. So, what difference does the second point in both the slides refer to?
user9947
20
votes
1 answer

Why do you not see dropout layers on reinforcement learning examples?

I've been looking at reinforcement learning, and specifically playing around with creating my own environments to use with the OpenAI Gym AI. I am using agents from the stable_baselines project to test with it. One thing I've noticed in virtually…
Matt Hamilton
  • 333
  • 2
  • 5
20
votes
4 answers

Why do we need floats for using neural networks?

Is it possible to make a neural network that uses only integers by scaling input and output of each function to [-INT_MAX, INT_MAX]? Is there any drawbacks?
elimohl
  • 311
  • 1
  • 2
  • 5
20
votes
3 answers

How are Artificial Neural Networks and the Biological Neural Networks similar and different?

I've heard multiple times that "Neural Networks are the best approximation we have to model the human brain", and I think it is commonly known that Neural Networks are modelled after our brain. I strongly suspect that this model has been simplified,…
20
votes
3 answers

How can we process the data from both the true distribution and the generator?

I'm struggling to understand the GAN loss function as provided in Understanding Generative Adversarial Networks (a blog post written by Daniel Seita). In the standard cross-entropy loss, we have an output that has been run through a sigmoid function…
tryingtolearn
  • 385
  • 1
  • 2
  • 10
20
votes
2 answers

How do neural networks play chess?

I have been spending a few days trying to wrap my head around how and why neural networks are used to play chess. Although I know very little about how the game of chess works, I can understand the following idea. Theoretically, we could make a…
stats_noob
  • 329
  • 1
  • 11
20
votes
2 answers

Why does GPT-2 Exclude the Transformer Encoder?

After looking into transformers, BERT, and GPT-2, from what I understand, GPT-2 essentially uses only the decoder part of the original transformer architecture and uses masked self-attention that can only look at prior tokens. Why does GPT-2 not…
Athena Wisdom
  • 351
  • 1
  • 2
  • 5
20
votes
2 answers

What is the "Hello World" problem of Reinforcement Learning?

As we all know, "Hello World" is usually the first program that any programmer learns/implements in any language/framework. As Aurélien Géron mentioned in his book that MNIST is often called the Hello World of Machine Learning, is there any "Hello…
Arpit-Gole
  • 394
  • 2
  • 9