I have a function f that uses as input a variable x which is a large np.ndarray (lenght 20000).
Execution of f takes very little (about 5ms).
A for loop over a matrix M with many rows
for x in M:
f(x)
takes about 5 times longer than parallelizing using multiprocessing
import multiprocessing
with multiprocessing.Pool() as pool:
pool.map(f, M)
I have tried to parallelize with dask but it loses even against sequential execution. Related post is here but the accepted answer doesn´t work for me. I have tried many thing like use partitions of the data as the best practices say or using dask.bag. I'm running Dask in local machine with 4 physical cores.
So the question is how to use dask with short tasks that take large data as input?