Notes from Mu Li‘s Talk.

Challenges

Scaling Distributed Machine Learning

Distributed Systems (系统)

Large Scale Optimization(算法)


Method 系统

Method 优化算法

With appropriate computational frameworks and algorithm design, distributed machine learning can be made simple, fast, and scalable, both in theory and in practice.

核心是 co-design,即算法和系统一起考虑。系统提供了足够多的支持的情况下算法可以更简单,比如在 MXNet 及时是很暴力的做同步通信,系统也能自动的做并行。同样,通信和计算比较高,需要更大的 batch size,也就需要更好的算法。

Parameter Server

Existing Open Source Systems in 2012

Architecture