The world is used to seeing headlines about the latest breakthrough in deep learning forms of artificial intelligence. However, the latest achievement of Google’s DeepMind division can be summed up as “a single AI program that does a lot of things.”
Jato, as DeepMind is called, It was revealed this week As a program called Multimedia, it can play video games, chat, write compositions, pictograms, and control robotic arm stacking blocks. It is a single neural network that can work with multiple types of data to perform multiple types of tasks.
“Using one set of weights, Gato can engage in dialogue, annotate images, stack blocks with a real robot arm, outsmart humans at playing Atari games, navigate 3D simulated environments, follow instructions, and more,” lead author Scott Reed wrote. and their colleagues in their paper “General Prosecutor” Posted on Arxiv preprint server.
DeepMind co-founder Demis Hassabis exclaimed to the team, shouting in a tweet“Our most public agent to date!! Great job from the team!”
The only catch is that Gato isn’t actually great at many tasks.
On the other hand, the software is able to do a better job than custom machine learning software at controlling Sawyer’s robotic arm that stacks blocks. On the other hand, it produces photo captions that are in many cases very poor quality. Likewise, her ability in a standard chat dialogue with a human interlocutor is mediocre, sometimes giving rise to contradictory and illogical statements.
It also plays less Atari 2600 video games than most dedicated ML programs designed to compete in the standard arcade learning environment.
Why make a program that does some things well and a bunch of other not so good? Antecedent and anticipation according to the authors.
There is precedent for more general types of software becoming state-of-the-art in artificial intelligence, and there is an expectation that increasing amounts of computing power will in the future make up for the shortcomings.
Generalism can tend to win in AI. As the authors note, citing artificial intelligence scientist Richard Sutton, “Historically, general models that are better at making use of computations also tend to eventually override industry-specific methods.”
As Sutton wrote In his blog post“The biggest lesson to read from 70 years of AI research is that general methods that take advantage of computation are ultimately the most effective and by a large margin.”
In a formal thesis, Reed and his team wrote: “Here we test the hypothesis that training an agent generally capable of a large number of tasks is possible; and that such a general agent can be adapted with little additional data to succeed on a larger number of tasks.”
The form, in this case, is, in fact, very general. It is a copy of Transformer, the dominant type of attention-based model that has become the basis for many programs including GPT-3. The transformer models the probability of an element by looking at the elements that surround it, such as words in a sentence.
In Gato’s case, DeepMind scientists can use the same conditional probability search on many types of data.
As Reed and colleagues describe the task of training Gatto,
During the Gato training phase, data from different tasks and methods is sequenced into a fixed sequence of symbols, assembled and processed by neural network adapters similar to a large language model. The loss is masked so that Gato only predicts business and text goals.
In other words, Gato does not treat tokens differently whether they are words in a conversation or motion vectors in a block stacking exercise. It’s all the same.
Buried within Reed and his team’s corollary hypothesis, that more and more computing power will eventually win. At the moment, Gato is limited by the response time of the arm of the Sawyer robot stacking blocks. At 1.18 billion network parameters, Gato is much smaller than very large AI models like GPT-3. As the scale of deep learning models increases, inference performance leads to a latency that can fail in the non-deterministic world of a real-world bot.
However, Reed and his colleagues expect this limit to be exceeded as AI devices get faster at processing.
“We focus our training on a point-to-scale operating model that allows real-time control of bots in the real world, currently around 1.2 billion variables in the case of JATO,” they wrote. “As device and model structures improve, this operating point will naturally increase the size of the possible model, pushing the generic models up the expansion law curve.”
Hence, Gato is really a model of how Scale Computing continues to be the main vector of machine learning development, by making generic models bigger and bigger. In other words, bigger is better.
The authors have some evidence for this. Gato seems to be getting better as it gets bigger. They compare average scores across all standard tasks for three sizes of the model at the criteria, 79 million, 364 million, and the main model, 1.18 billion. “We can see that for the number of equivalent tokens, there is a significant improvement in performance with increasing size,” the authors wrote.
An interesting future question is whether specialized software is more dangerous than other types of AI software. The authors spend a significant amount of research time discussing the fact that there are potential risks that are not yet well understood.
The idea of a program that handles multiple tasks for the average person suggests some kind of human adaptability, but that could be a dangerous misconception. “For example, physical embodiment may lead to users avataring a proxy, resulting in misplaced trust in the event of a broken system, or it could be exploited by bad actors,” Reed and his team wrote.
Additionally, while transfer of knowledge across domains is often a goal in ML research, it can lead to unexpected and undesirable outcomes if certain behaviors (such as arcade fighting) are shifted into the wrong context.
Hence, they wrote, “Ethics and safety considerations for knowledge transfer may require substantial new research as public systems advance.”
(As an interesting side note, Gato’s paper uses a chart to describe risks devised by former Google AI researcher Margaret Michell and her colleagues, called Model Cards. Model Cards provide a brief summary of what the AI program is, what it does, and what factors influence How it works. And Michelle wrote last year that Forced to sign out of Google To support her former colleague, Timnit Gebru, whose ethical concerns about AI conflict with Google’s leadership in AI.)
Gatto is by no means unique in its generalizing tendency. It’s part of the general trend of mainstreaming, and larger models use buckets of horsepower. The world got its first taste of Google’s inclination in this direction last summerwith Google’s “Perceiver” neural network that combined text transducer tasks with images, audio and LiDAR spatial coordinates.
Among its peers are PaLM, the Pathways Language Model, Introduced by Google scientists this yeara 540 billion variant model uses new technology to orchestrate thousands of chips, Known as Pathways, also invented by Google. neural network Released in January By Meta, called “data2vec,” converters use image data, sound waveforms for speech, and text language representations all in one.
What’s new in Gato seems to be the intent to use AI for non-robotic tasks and push it into the world of robotics.
The creators of Gato, who point to the achievements of Pathways, and other generic approaches, see the ultimate achievement in artificial intelligence that can operate in the real world, with any kind of task.
“Future work should consider how to unify these script capabilities into a complete generic agent that can also operate in real time in the real world, in diverse environments and incarnations.”
You can then consider Gato as an important step on the way to a solution The hardest problem in artificial intelligence, robotics.