Google's new TPUs are here to accelerate AI training

Image: Google

Google has made another leap forward in the realm of machine learning hardware. The tech giant has begun deploying the second version of its Tensor Processing Unit, a specialized chip meant to accelerate machine learning applications, company CEO Sundar Pichai announced on Wednesday.

The new Cloud TPU sports several improvements over its predecessor. Most notably, it supports training machine learning algorithms in addition to processing the results from existing models. Each chip can provide 180 teraflops of processing for those tasks. Google is also able to network the chips together in sets of what are called TPU Pods that allow even greater computational gains.

Businesses will be able to use the new chips through Google’s Cloud Platform,as part of its Compute Engine infrastructure-as-a-service offering. In addition, the company is launching a new TensorFlow Research Cloud that will provide researchers with free access to that hardware if they pledge to publicly release the results of their research.

It’s a move that has the potential to drastically accelerate machine learning. Google says its latest machine translation model takes a full day to train on 32 of the highest-powered modern GPUs, while an eighth of a TPU Pod can do the same task in an afternoon.

Machine learning has become increasingly important for powering the next generation of applications. Accelerating the creation of new models means that it’s easier for companies like Google to experiment with different approaches to find the best ones for particular applications.

Google’s new hardware can also serve to attract new customers to its cloud platform, at a time when the company is competing against Microsoft, Amazon, and other tech titans. The Cloud TPU announcement comes a year after Google first unveiled the Tensor Processing Unit at its I/O developer conference.

Programming algorithms that run on TPUs will require the use of TensorFlow, the open source machine learning framework that originated at Google. TensorFlow 1.2 includes new high-level APIs that make it easier to take systems built to run on CPUs and GPUs and also run them on TPUs. Makers of other machine learning frameworks like Caffe can make their tools work with TPUs by designing them to call TensorFlow APIs, according to Google Senior Fellow Jeff Dean.

Dean wouldn’t elaborate on any concrete performance metrics of the Cloud TPU, beyond the chips’ potential teraflops. One of the things that a recent Google research paper pointed out is that different algorithms perform differently on the original TPU, and it’s unclear if the Cloud TPU behaves in a similar manner.

Google isn’t the only company investing in hardware to help with machine learning. Microsoft is deploying field-programmable gate arrays in its data centers to help accelerate its intelligent applications.

This story has been corrected to clarify availability of the Cloud TPU as part of Google Compute Engine.