Last year, Microsoft announced a billion-dollar investment in OpenAI, an organization whose mission is to create artificial general intelligence and make it safe for humanity. No Terminator-like dystopias here. No deranged machines making humans into paperclips. Just computers with general intelligence helping us solve our biggest problems.
A year on, we have the first results of that partnership. At this year’s Microsoft Build 2020, a developer conference showcasing Microsoft’s latest and greatest, the company said they’d completed a supercomputer exclusively for OpenAI’s machine learning research. But this is no run-of-the-mill supercomputer. It’s a beast of a machine. The company said it has 285,000 CPU cores, 10,000 GPUs, and 400 gigabits per second of network connectivity for each GPU server.
Stacked against the fastest supercomputers on the planet, Microsoft says it’d rank fifth.
The company didn’t release performance data, and the computer hasn’t been publicly benchmarked and included on the widely-followed Top500 list of supercomputers. But even absent official rankings, it’s likely safe to say its a world-class machine.
“As we’ve learned more and more about what we need and the different limits of all the components that make up a supercomputer, we were really able to say, ‘If we could design our dream system, what would it look like?’” said OpenAI CEO Sam Altman. “And then Microsoft was able to build it.”
What will OpenAI do with this dream-machine? The company is building ever bigger narrow AI algorithms—we’re nowhere near AGI yet—and they need a lot of computing power to do it.
The Pursuit of Very Large AI Models
The size of the most advanced AI models—that is, the neural networks in machine learning algorithms—has been growing fast. At the same time, according to OpenAI, the computing power needed to train these models has been doubling every 3.4 months.
The bigger the model, the bigger the computer you need to train it.
This growth is in part due to the number of parameters used in each model. Simplistically, these are the values “neurons” operating on data in a neural net assume through training. OpenAI’s GPT-2 algorithm, which generated convincing text from prompts, consisted of nearly 1.5 billion parameters. Microsoft’s natural language generating AI model, Turing NLG, was over 10 times bigger, weighing in at 17 billion parameters. Now, OpenAI’s GPT-3, just announced Thursday, is reportedly made up of a staggering 175 billion parameters.
There’s another trend at play too.
Whereas many machine learning algorithms are trained on human-labeled data sets, Microsoft, OpenAI, and others are also pursuing “unsupervised” machine learning. This means that with enough raw, unlabeled data the algorithms teach themselves by identifying patterns in that data.
Some of the latest systems can also perform more than one task in a given domain. An algorithm trained on the raw text of billions of internet pages—from Wikipedia entries to self-published books—can infer relationships between words, concepts, and context. Instead of being able to do only one thing, like generate text, it can transfer its learning to multiple related tasks in the same domain, like also reading documents and answering questions.
The Turing NLG and GPT-3 algorithms fall into this category.
“The exciting thing about these models is the breadth of things they’re going to enable,” said Microsoft Chief Technical Officer Kevin Scott. “This is about being able to do a hundred exciting things in natural language processing at once and a hundred exciting things in computer vision, and when you start to see combinations of these perceptual domains, you’re going to have new applications that are hard to even imagine right now.”
If Only We Had a Bigger Computer…
To be clear, this isn’t AGI, and there’s no certain path to AGI yet. But algorithms beginning to modestly generalize within domains is progress.
A looming question is whether the approach will continue progressing as long as researchers can throw more computing power at it, or if today’s machine learning needs to be augmented with other techniques. Also, if the most advanced AI research requires such prodigious resources, then increasingly, only the most well-heeled, well-connected private organizations will be able to play.
Some good news is that even as AI model size is growing, the efficiency of those models is improving too. Each new breakthrough requires a big jump in computing power, but later models are tweaked and tuned, such that successor algorithms can do as well or better with less computing power.
Microsoft also announced an update to its open source deep learning toolset, DeepSpeed, first released in February. The company says DeepSpeed can help developers train models 15 times larger and 10 times faster using the same computing resources. And they also plan to open source their Turing models so the broader community can build on them.
The general idea is that once one of these very large AI models has been trained, it can actually be customized and employed by other researchers or companies with far fewer resources.
In any case, Microsoft and OpenAI are committed to very large AI, and their new machine may be followed by even bigger systems in the years ahead.
“We’re testing a hypothesis that has been there since the beginning of the field: that a neural network close to the size of the human brain can be trained to be an AGI,” Greg Brockman, OpenAI’s co-founder, chairman, and CTO, told the Financial Times when Microsoft’s investment was first made public. “If the hypothesis is true, the upside for humanity will be remarkable.”