Most Popular

1500 questions
5
votes
1 answer

Why do we need importance sampling?

I was studying the off-policy policy improvement method. Then I encountered importance sampling. I completely understood the mathematics behind the calculation, but I am wondering what is the practical example of importance sampling. For instance,…
5
votes
1 answer

What's the difference between content-based attention and dot-product attention?

I'm following this blog post which enumerates the various types of attention. It mentions content-based attention where the alignment scoring function for the $j$th encoder hidden state with respect to the $i$th context vector is the cosine…
Alexander Soare
  • 1,339
  • 2
  • 11
  • 27
5
votes
4 answers

What is the fundamental difference between an ML model and a function?

A model can be roughly defined as any design that is able to solve an ML task. Examples of models are the neural network, decision tree, Markov network, etc. A function can be defined as a set of ordered pairs with one-to-many mapping from a domain…
hanugm
  • 3,820
  • 3
  • 24
  • 56
5
votes
1 answer

Why multiplayer, imperfect information, trick-taking card games are hard for AI?

AI reached a super-human level in many complex games such as Chess, Go, Texas hold'em Poker, Dota2 and StarCraft2. However it still did not reach this level in trick-taking card games. Why there is no super-human AI playing imperfect-information,…
Cohensius
  • 413
  • 3
  • 15
5
votes
1 answer

Wasserstein GAN: Implemention of Critic Loss Correct?

The WGAN paper concretely proposes Algorithm 1 (cf. page 8). Now, they also state what their loss for the critic and the generator is. When implementing the critic loss (so lines 5 and 6 of Algorithm 1), they maximize the parameters $w$ (instead of…
5
votes
2 answers

What's the difference between architectures and backbones?

In the paper "ForestNet: Classifying Drivers of Deforestation in Indonesia using Deep Learning on Satellite Imagery", the authors talk about using: Feature Pyramid Networks (as the architecture) EfficientNet-B2 (as the backbone) Performance…
5
votes
1 answer

Multi Armed Bandits with large number of arms

I'm dealing with a (stochastic) Multi Armed Bandit (MAB) with a large number of arms. Consider a pizza machine that produces a pizza depending on an input $i$ (equivalent to an arm). The (finite) set of arms $K$ is given by $K=X_1\times X_2 \times…
D. B.
  • 101
  • 6
5
votes
2 answers

How to deal with the time delay in reinforcement learning?

I have a question regarding the time delay in reinforcement learning (RL). In the RL, one has state, reward and action. It is usually assumed that (as far as I understand it) when the action is executed on the system, the state changes immediately…
jengmge
  • 51
  • 1
  • 2
5
votes
1 answer

Why is second-order backpropagation useful?

Raul Rojas's book on Neural Networks dedicates section 8.4.3 to explaining how to do second-order backpropagation, that is, computing the Hessian of the error function with respect to two weights at a time. What problems are easier to solve using…
EmmanuelMess
  • 227
  • 3
  • 15
5
votes
2 answers

Transformers: how does the decoder final layer output the desired token?

In the paper Attention Is All You Need, this section confuses me: In our model, we share the same weight matrix between the two embedding layers [in the encoding section] and the pre-softmax linear transformation [output of the decoding…
user3667125
  • 1,570
  • 6
  • 13
5
votes
1 answer

Can AlphaFold predict proteins with metals well?

There are certain proteins that contain metal components, known as metalloproteins. Commonly, the metal is at the active site which needs the most prediction precision. Typically, there is only one (or a few) metals in a protein, which contains far…
jw_
  • 199
  • 1
  • 5
5
votes
2 answers

How to detect a full-fledged self-aware AI?

The premise: A full-fledged self-aware artificial intelligence may have come to exist in a distributed environment like the internet. The possible A.I. in question may be quite unwilling to reveal itself. The question: Given a first initial…
user4327
  • 61
  • 5
5
votes
1 answer

Why does off-policy learning outperform on-policy learning?

I am self-studying about Reinforcement Learning using different online resources. I now have a basic understanding of how RL works. I saw this in a book: Q-learning is an off-policy learner. An off-policy learner learns the value of an optimal…
Exploring
  • 343
  • 6
  • 16
5
votes
1 answer

Why would a VAE train much better with batch sizes closer to 1 over batch size of 100+?

I've been training a VAE to reconstruct human names and when I train it on a batch size of 100+ after about 5 hours of training it tends to just output the same thing regardless of the input and I'm using teacher forcing as well. When I use a lower…
user8714896
  • 797
  • 1
  • 6
  • 24
5
votes
2 answers

Given two optimal policies, is an affine combination of them also optimal?

If there are two different optimal policies $\pi_1, \pi_2$ in a reinforcement learning task, will the linear combination (or affine combination) of the two policies $\alpha \pi_1 + \beta \pi_2, \alpha + \beta = 1$ also be an optimal policy? Here I…
yang liu
  • 53
  • 4