After a whole epoch, with multiple update steps, the neural networks in each thread will have diverged in a way where it may not make sense to take means of the weights. Ideally you should be combining data for each update step. In turn that means you will want to avoid making updates on every example, because the overhead of starting, stopping and combining the threads may lose most of the benefits.
It is common in neural networks to use mini-batches (larger than 1, smaller than the whole dataset), to get more accurate gradients, and for parallelisation. There is often a sweet spot in terms of learning speed (or sample efficiency) with some size of mini-batch. Each mini-batch calculates gradients for all examples, combines them into a mean gradient, then performs a single weight update step.
Use your threads to calculate the gradients for a mini-batch, divided up between the threads, and average the gradients across all threads in order to make a single shared weight update. Using larger mini-batches will make more efficient use of multiple threads, but smaller mini-batches can be beneficial because you get to make more weight updates per epoch.