Libraries
MPI: efficient CPU allreduce
dmlc/rabit: falut tolerant variant
facebookincubator/gloo
Parameter Hub: from UW
NCCL: Nvidia's efficient multi-GPU collective
PS-Lite Documents - ps-lite 1.0 documentation
Technologies behind Distributed Deep Learning: AllReduce
Interface: result = allreduce(float buffer[size])
grad = gradient(net, w)
for epoch, data in enumerate(dataset):
g = net.run(grad, in=data)
gsum = comm.allreduce(g, op=sum)
w -= lr * gsum / num_workers

o
time complexity:






Interface: key-value store
ps.push(index, gradient) && ps.pull(index)