Myvideo

Guest

Login

NWA: Visual Synthesis Pre-training for Neural visUal World creAtion (ML Research Paper Explained)

Uploaded By: Myvideo
1 view
0
0 votes
0

#nuwa #microsoft #generative NÜWA is a unifying architecture that can ingest text, images, and videos and brings all of them into a quantized latent representation to support a multitude of visual generation tasks, such as text-to-image, text-guided video manipulation, or sketch-to-video. This paper details how the encoders for the different modalities are constructed, and how the latent representation is transformed using their novel 3D nearby self-attention layers. Experiments are shown on 8 different visual generation tasks that the model supports. OUTLINE: 0:00 - Intro & Outline 1:20 - Sponsor: ClearML 3:35 - Tasks & Naming 5:10 - The problem with recurrent image generation 7:35 - Creating a shared latent space w/ Vector Quantization 23:20 - Transforming the latent representation 26:25 - Recap: Self- and Cross-Attention 28:50 - 3D Nearby Self-Attention 41:20 - Pre-Training Objective 46:05 - Experimental Results 50:40 - Conclusion & Comments Paper: Github: https://github

Share with your friends

Link:

Embed:

Video Size:

Custom size:

x

Add to Playlist:

Favorites
My Playlist
Watch Later