Visualizing the deep learning revolution

Before we try to figure out what the future of AI might look like, it’s helpful to take a look at what AI can already do.

This memo will give a selection of some of the most impressive results from current state of the art AI systems, to give you a sense of the current state of AI capabilities.

ML systems today can only perform a very small portion of tasks that humans can do, and (with a few exceptions) only within narrow specialties (like playing one particular game or generating one particular kind of image).

That said, since the increasingly widespread use of deep learning in the mid-2010s, there has been huge progress in what can be achieved with ML.

PaLM

PaLM (Pathways Language Model) is a 540 billion parameter language model produced by Google Research. It was trained on a dataset of billions of sentences that represent a wide range of natural language use cases. The dataset is a mixture of filtered webpages, books, Wikipedia, news articles, source code, and social media conversations.

PaLM demonstrates impressive natural language understanding and generation capabilities. For example, the model can distinguish cause and effect, infer conceptual combinations in appropriate contexts, and even guess a movie from an emoji.

PaLM can even generate explicit explanations for scenarios that require a complex combination of multi-step logical inference, world knowledge, and deep language understanding. For example, it can provide high quality explanations for novel jokes not found on the web.

Google tested various different sizes of PaLM. As the scale of the model increased, the performance improved across tasks and also unlocking new capabilities (such as the ability to output working computer code, do common-sense reasoning, and make logical inferences).

PaLM 540B (the largest model) generalises it’s ability to understand language to be able to successfully complete coding tasks, such as writing code given a description (text-to-code), translating code from one language to another, and fixing compilation errors (code-to-code). PaLM shows strong performance across coding tasks even though code represented only 5% of the data in its training dataset.

Math & ML problem solving - Minerva (paper) (blog)

Minerva is a model that is built based on the Pathways Language Model (PaLM), but with further training on a 118 GB dataset of scientific papers and web pages that contain mathematical expressions.

It was built to solve mathematical and scientific questions like the following:

The input to Minerva is the question, exactly as you see here. It’s worth recalling that Minerva is designed only to predict language and has no access to a calculator or any other external tools.

Given the above input, here is what Minerva outputs (correctly):

It can also correctly answer questions in Biology, Chemistry, Physics, and more:

Minerva doesn’t get all of its answers right, however its performance was impressive. To test Minerva’s abilities, researchers evaluated the model on STEM benchmarks ranging in difficulty from grade school level problems to graduate level coursework.

MATH: High school math competition level problems
MMLU-STEM: Topics such as engineering, chemistry, math, and physics at high school and college level.
OCWCourses, a collection of college and graduate level problems covering topics such as solid state chemistry, astronomy, differential equations, and special relativity.
GSM8k: Grade school level math problems

In all cases, Minerva obtains record breaking results, sometimes by a wide margin.

ACT-1: Transformer for Actions (blog)

ACT-1 is a large-scale model that is trained to use a computer interface, such as a web browser. It can use a browser and interact with the internet through things like clicking, typing, and scrolling, etc.

ACT-1 can take commands in written language and then takes control of the browser to execute them.

MuZero: Mastering Go, chess, shogi and Atari without rules

In 2016, DeepMind demonstrated AlphaGo, the first AI system to defeat a world champion at the game of Go.

Go is known as the most challenging classical game for artificial intelligence because of its complexity. There are an astonishing 10 to the power of 170 possible board configurations - more than the number of atoms in the known universe. This makes the game of Go a googol times more complex than chess.

Prior to 2016, despite decades of work, the strongest Go computer programs could only play at the level of human amateurs.

A year later, in 2017, DeepMind introduced AlphaZero - a single system that taught itself from scratch how to master not only Go, but also the games of chess and shogi. AlphaZero was given nothing beyond the basic rules of the games, and achieved mastery level by playing against itself repeatedly.

In 2020, DeepMind introduced MuZero which it described as a “a significant step forward in the pursuit of general-purpose algorithms”. MuZero masters Go, chess, shogi and Atari video games without even being told the rules of the games, which it learns itself through its ability to plan strategies in unknown environments.

Other examples

‣

AlphaStar, which can beat top professional players at StarCraft II (January 2019)

‣

GPT-3, a natural language model capable of producing high-quality text (May 2020)

GPT-f, which can solve some Maths Olympiad problems (September 2020)

‣

AlphaFold 2, a huge step forward in solving the long-perplexing protein-folding problem (July 2021)

‣

Codex, which can produce code for programs from natural language instructions (August 2021)

‣

DALL-E 2 (April 2022) and Imagen (May 2022), which are both capable of generating high-quality images from written descriptions

‣

SayCan, which takes natural language instructions and uses them to operate a robot (April 2022)

Gato, a single ML model capable of doing a huge number of different things (including playing Atari, captioning images, chatting, and stacking blocks with a real robot arm), deciding based on its context what it should output (May 2022)