Deep3: Leveraging Three Levels of Parallelism for Efficient Deep Learning.
This paper proposes Deep3 an automated platform-aware Deep Learning (DL) framework that brings orders of magnitude performance improvement to DL training and execution. Deep3 is the first to simultaneously leverage three levels of parallelism for performing DL: data, network, and hardware. It uses platform profiling to abstract physical characterizations of the target platform. The core of Deep3 is a new extensible methodology that enables incorporation of platform characteristics into the higher-level data and neural network transformation. We provide accompanying libraries to ensure automated customization and adaptation to different datasets and platforms. Proof-of-concept evaluations demonstrate 10-100 fold physical performance improvement compared to the state-of-the-art DL frameworks, e.g., TensorFlow.