This document summarizes a presentation on scaling deep learning algorithms on extreme scale architectures. It discusses challenges in using deep learning, a vision for machine/deep learning R&D including novel algorithms, and the MaTEx toolkit which supports distributed deep learning on GPU and CPU clusters. Sample results show strong and weak scaling of asynchronous gradient descent on Summit. Fault tolerance needs and the impact of deep learning on other domains are also covered.
Related topics: