Myvideo

Guest

Login

NSDI 19 - Tiresias: A GPU Cluster Manager for Distributed Deep Learning

Uploaded By: Myvideo
6 views
0
0 votes
0

Juncheng Gu, Mosharaf Chowdhury, and Kang G. Shin, University of Michigan, Ann Arbor; Yibo Zhu, Microsoft and Bytedance; Myeongjae Jeon, Microsoft and UNIST; Junjie Qian, Microsoft; Hongqiang Liu, Alibaba; Chuanxiong Guo, Bytedance Deep learning (DL) training jobs bring some unique challenges to existing cluster managers, such as unpredictable training times, an all-or-nothing execution model, and inflexibility in GPU sharing. Our analysis of a large GPU cluster in production shows that existing big data s

Share with your friends

Link:

Embed:

Video Size:

Custom size:

x

Add to Playlist:

Favorites
My Playlist
Watch Later