Rationing bandwidth resources for mitigating network resource contention in distributed DNN training clusters

Qi, Q; Xu, F; Chen, L; Zhou, Z

Xu, F (corresponding author), East China Normal Univ, Sch Comp Sci & Technol, Shanghai Key Lab Multidimens Informat Proc, Shanghai, Peoples R China.

CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2021; 3 (2): 171

Abstract

Distributed deep neural network (DDNN) training becomes increasingly compelling as the DNN model gets complex and the dataset grows large. Through an ......

Full Text Link