Training Keras Towards Or Against Analog Value?

Question

For example, if I want to do a cat and mouse AI, the cat would wish to minimize the time taken for it to catch the mouse and the mouse would want to maximize that time. The time is analog and thus I cannot use a traditional Xy method but need another method that goes like this:

network.train_against_value(X, y, determinator)

Here, X is more like where the cat and mouse are. y is where the cat or mouse should move, and determinator is the time taken for the mouse to be caught, where the mouse wishes to maximize this value through its output of y and the cat wishes to minimize it. There is one Xy pair for each decision made by the cat and mouse, but one determinator throughout one game. Many games are played to train the AI.

Example: X: (300, 300, 200, 200) -> (mousex, mousey, catx, caty)

Y: (1,3) -> (xmove, ymove) direction, the numbers are then tuned by code for the actual movement to be always 1.

Determinator: 50 -> time for mouse to be caught in seconds

Where it would train so that with every X inputted it outputs a y so that determinator is minimum. Is there a method for train_towards_value as well? If there is no prebuilt method, how do I create one? What is the technical name for this kind of training?

I have two neural networks for the cat and mouse, where the cat is slower than the mouse but is larger and could eat the mouse. Just consider the mouse is difficult to control from the neural network because of inefficiencies so that it is possible for the cat to catch the mouse.

Don't you just want 2 different neural networks (1 for cat and 1 for mouse)? In this case you can just train to a minimum of time or a maximum, right? — Lustwelpintje, Sep 18 '19 at 07:43
Could you make it clearer what X represents for you? I'm guessing that it has something to do with what the cat or mouse observes. Would you describe them as agents or bots in some kind of game? Perhaps explain a bit more what you mean by a "cat and mouse AI" . . .use [edit] to add details to the question, don't leave it all in comments. — Neil Slater, Sep 18 '19 at 12:01
It may also help if you give some simplified concrete examples of how you expect train_against_value to work with actual values for X, y and determinator. I don't think it can be done like that, nor that is what you want, but seeing how you are thinking in detail will really help towards writing an answer. — Neil Slater, Sep 18 '19 at 14:07

score 0 · Accepted Answer · edited Jun 17 '20 at 09:57

What is the technical name for this kind of training?

The name for the problem is Sequential Decision Making or Optimal Control.

There are a few different approaches you can take when solving this kind of problem. However, I think that the way that you are describing your project, Reinforcement Learning (RL) would match your approach the best.

I cannot use a traditional Xy method but need another method that goes like this:

network.train_against_value(X, y, determinator)

Although this method signature could probably be made to work by re-structuring one or other RL frameworks, it is more usual when using RL with neural networks to treat the experience-gathering and scoring systems - the core ideas behind RL - as a data generator for either a supervised learning problem or in some cases directly producing the output layer gradients.

One approach you could use is called Deep Q Networks (DQN) which effectively generates mini-batches for supervised learning of a neural network, based on gathering experiences.

In brief, DQN trains a neural network to predict what you are calling the "determinator", but that RL would call the "value" of each action. So you may move the action choice (y) to the input of the neural network, or alternatively predict multiple values (one output for each possble action) - the RL theory is the same for each approach here, it is an implementation detail. The agent (cat or mouse) would then pick the action with the highest predicted value by default (with the cat's values being the negative of the mouse's), trying other actions at random whilst training so as to learn all values. Whenever the agent gained new experience, it would add that to the training data so it could improve its predictions.

RL is a complex subject in its own right, and to begin understanding it properly you would do well to start by tackling even simpler problems that don't require neural networks or running two agents against each other. There is a good introductory book on the subject with the option to read a free PDF version: Reinforcement Learning: An Introduction (Sutton & Barto)

So is there a Keras library or something where I can plug and play the RF? — Aphrodite, Sep 19 '19 at 10:25
@Aphrodite: There is https://github.com/keras-rl/keras-rl but it won't be plug and play from your current design, you will need to modify it to fit RL. It has examples and tutorials, so you might be able to figure out from those. — Neil Slater, Sep 19 '19 at 10:33

Training Keras Towards Or Against Analog Value?

1 Answers1