A central goal of artificial intelligence is to build systems that can flexibly process all the world’s data, but current neural network architectures are designed to handle essentially one data configuration. This includes models like 2D convnets and more recent Vision Transformer models, which scale well on images but can be challenging to apply to other data. In this talk, I’ll be describing Perceivers, a new family of architectures that scale well to many kinds of high-dimensional data while making essentially no domain assumptions. Perceivers leverage an asymmetric attention mechanism to encode and decode data from a latent bottleneck. This mechanism allows Perceivers to handle inputs and outputs several orders of magnitude larger than can be used with standard Transformers. Using this architecture, we obtain performance comparable to or better than domain-specific architectures on a wide range of tasks, including image classification, natural language understanding, optical flow, multimodal autoencoding & classification, set processing, and multi-task learning. Drew Jaegle is a Senior Research Scientist at DeepMind. His research focuses on developing domain-general perceptual and reasoning systems, and his work spans architecture design, imitation learning and RL, self-supervised learning, physical inference, and philosophy of science. Before joining DeepMind, he worked on computer vision and computational neuroscience at the University of Pennsylvania, where he did a PhD in the GRASP Lab with Kostas Daniilidis and a postdoc with Nicole Rust. A full list of guest lectures can be found here:
Hide player controls
Hide resume playing