Multi GPU in Keras

Question

How we can program in the Keras library (or TensorFlow) to partition training on multiple GPUs? Let's say that you are in an Amazon ec2 instance that has 8 GPUs and you would like to use all of them to train faster, but your code is just for a single CPU or GPU.

@sb0709: I started reading this morning but I was wondering how to do it in keras — Hector Blandin, Oct 19 '17 at 02:31
don't know in keras but for tensorflow: tf will use GPU by default for computation even if is for CPU (if is present supported GPU). so you can just do a for loop: "for d in ['/gpu:1','/gpu:2', '/gpu:3' ... '/gpu:8',]:" and in the "tf.device(d)" should include all your instance GPU resources. So tf.device() will actually be used. — n1tk, Oct 19 '17 at 02:34
Like this ??
for d in ['/gpu:1','/gpu:2', '/gpu:3' ... '/gpu:8',]: tf.device(d)

and that is ? I will try like that :) — Hector Blandin, Oct 19 '17 at 02:36
also uber just opened "horovod" specially for this task this couple days. — n1tk, Oct 19 '17 at 15:09

score 53 · Accepted Answer · edited Jan 01 '21 at 04:42

53

From the Keras FAQs, below is copy-pasted code to enable 'data parallelism'. I.e. having each of your GPUs process a different subset of your data independently.

from keras.utils import multi_gpu_model
Replicates model on 8 GPUs.
This assumes that your machine has 8 available GPUs.
parallel_model = multi_gpu_model(model, gpus=8)
parallel_model.compile(loss='categorical_crossentropy',
                       optimizer='rmsprop')
This fit call will be distributed on 8 GPUs.
Since the batch size is 256, each GPU will process 32 samples.
parallel_model.fit(x, y, epochs=20, batch_size=256)

Note that this appears to be valid only for the Tensorflow backend at the time of writing.

Update (Feb 2018):

Keras now accepts automatic gpu selection using multi_gpu_model, so you don't have to hardcode the number of gpus anymore. Details in this Pull Request. In other words, this enables code that looks like this:

try:
    model = multi_gpu_model(model)
except:
    pass

But to be more explicit, you can stick with something like:

parallel_model = multi_gpu_model(model, gpus=None)

Bonus:

To check if you really are utilizing all of your GPUs, specifically NVIDIA ones, you can monitor your usage in the terminal using:

watch -n0.5 nvidia-smi

References:

edited Jan 01 '21 at 04:42

Shayan Shafiq

1,012
4
12
24

answered Dec 17 '17 at 05:04

weiji14

656
6
5

1

Does the multi_gpu_model(model, gpus=None) work in the case where there is only 1 GPU? It would be cool if it automatically adapted to the number of GPUs available. – CMCDragonkai Aug 30 '18 at 06:25
Yes I think it works with 1 GPU, see https://github.com/keras-team/keras/pull/9226#issuecomment-361692460, but you might need to be careful that your code is adapted to run on a multi_gpu_model instead of a simple model. For most cases it probably wouldn't matter, but if you're going to do something like take the output of some intermediate layer, then you'll need to code accordingly. – weiji14 Sep 06 '18 at 03:35
Do you have any references to multi gpu model differences? – CMCDragonkai Sep 09 '18 at 07:18
You mean something like https://github.com/rossumai/keras-multi-gpu/blob/master/blog/docs/index.md? – weiji14 Sep 10 '18 at 04:17
That reference was great @weiji14. However I'm also interested in how this works for inference. Does keras somehow split batches equally or round robin schedule on available model replicas? – CMCDragonkai Dec 10 '18 at 06:00
I reported the batch size issue during multi-GPU inference: https://github.com/keras-team/keras/issues/11844 – CMCDragonkai Dec 12 '18 at 01:31

score 5 · Answer 2 · answered Oct 24 '17 at 07:45

For TensorFlow:

TensorFlow Using GPUs

Here is the sample code on how is used, so for each task is specified the list with devices/device:

# Creates a graph.
c = []
for d in ['/gpu:2', '/gpu:3']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
  sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(sum))

tf will use GPU by default for computation even if is for CPU (if is present supported GPU). so you can just do a for loop: "for d in ['/gpu:1','/gpu:2', '/gpu:3' ... '/gpu:8',]:" and in the "tf.device(d)" should include all your instance GPU resources. So tf.device() will actually be used.

Scaling Keras Model Training to Multiple GPUs

Keras

For Keras by using Mxnet than args.num_gpus, where num_gpus is the list of your required GPUs.

def backend_agnostic_compile(model, loss, optimizer, metrics, args):
  if keras.backend._backend == 'mxnet':
      gpu_list = ["gpu(%d)" % i for i in range(args.num_gpus)]
      model.compile(loss=loss,
          optimizer=optimizer,
          metrics=metrics, 
          context = gpu_list)
  else:
      if args.num_gpus > 1:
          print("Warning: num_gpus > 1 but not using MxNet backend")
      model.compile(loss=loss,
          optimizer=optimizer,
          metrics=metrics)

horovod.tensorflow

On top of all Uber open sourced Horovod recently and I think is great:

Horovod

import tensorflow as tf
import horovod.tensorflow as hvd

# Initialize Horovod
hvd.init()

# Pin GPU to be used to process local rank (one GPU per process)
config = tf.ConfigProto()
config.gpu_options.visible_device_list = str(hvd.local_rank())

# Build model…
loss = …
opt = tf.train.AdagradOptimizer(0.01)

# Add Horovod Distributed Optimizer
opt = hvd.DistributedOptimizer(opt)

# Add hook to broadcast variables from rank 0 to all other processes during
# initialization.
hooks = [hvd.BroadcastGlobalVariablesHook(0)]

# Make training operation
train_op = opt.minimize(loss)

# The MonitoredTrainingSession takes care of session initialization,
# restoring from a checkpoint, saving to a checkpoint, and closing when done
# or an error occurs.
with tf.train.MonitoredTrainingSession(checkpoint_dir=“/tmp/train_logs”,
                                      config=config,
                                      hooks=hooks) as mon_sess:
 while not mon_sess.should_stop():
   # Perform synchronous training.
   mon_sess.run(train_op)

score 2 · Answer 3 · answered Dec 26 '17 at 11:16

Basically, you can take example of the following example. All you need is specifying cpu and gpu consumption values after importing keras.

import keras

config = tf.ConfigProto( device_count = {'GPU': 1 , 'CPU': 56} )
sess = tf.Session(config=config) 
keras.backend.set_session(sess)

After then, you would fit the model.

model.fit(x_train, y_train, epochs=epochs, validation_data=(x_test, y_test))

Finally, you can decrease the consumption values not the work on upper limits.

Session seems to be in compat for Tensorflow 2.0, but not main function. — EngrStudent, Jan 06 '20 at 20:13

score 1 · Answer 4 · edited Jan 01 '21 at 04:41

1

Simple example for how we can access multiple GPUs with Horovd and Keras: Github code Keras MNIST Example with Horovod.

Plus, please go to the link for further info: Horovod with Keras

edited Jan 01 '21 at 04:41

Shayan Shafiq

1,012
4
12
24

answered Dec 30 '20 at 19:24

Muhammad Shifa

11
3

Multi GPU in Keras

4 Answers4

Replicates `model` on 8 GPUs.

This assumes that your machine has 8 available GPUs.

This `fit` call will be distributed on 8 GPUs.

Since the batch size is 256, each GPU will process 32 samples.

Linked