1

As detailed here, the way to go to break NN training over multiple machines/threads, is decompose training data set on multiple chunks and send to each node, then sum results back in main node.

There is some library who already implements these techniques? Agents to install on each node?

1 Answers1

3

TensorFlow and PyTorch both support distributed training.

Brian Spiering
  • 21,136
  • 2
  • 26
  • 109