DeepMind lab specialists have created a universal Perceiver IO architecture for processing all types of input and output data.
It is based on the original Perceiver model, introduced in June 2021. It processes images, audio, video and combinations thereof, but is limited to tasks with simple outputs, such as classification.
To solve this problem, the researchers created a more general version of the architecture, Perceiver IO. It can produce a wide range of outputs from a variety of inputs, making it applicable to areas such as natural language processing, computer vision and multimodal understanding.
The results of estimating optical flow or tracking the motion of all pixels in an image using Perceiver IO. The color of each pixel shows the direction and speed of movement. Data. DeepMind.
Perceiver and Perceiver IO are built on a transformer architecture that works well for input data containing several thousand elements. However, images, audio and video can contain millions of such elements, the researchers say.
“With the original Perceiver, we solved the main problem of the universal architecture: scaling Transformers to very large input data without introducing domain-specific assumptions,” the blog said.
The researchers also believe that Perceiver IO can achieve an unprecedented level of universality.
They have published the architecture’s source code on GitHub and hope it will help researchers and practitioners develop applications without having to spend resources creating customized solutions using specialized systems.
Recall that at the end of July, DeepMind unveiled XLand, an extensive gaming environment for training versatile artificial intelligence agents.
In July, experts from the AI lab compiled and published the most complete database of human protein structures created by the AlphaFold neural network.
In June, DeepMind scientists announced that reinforcement learning is sufficient to achieve general artificial intelligence.