In which Nvidia responds to Google’s public benchmarks of their Tensor Processing Unit:

To update Google’s comparison, we created the chart below to quantify the performance leap from K80 to P40, and to show how the TPU compares to current NVIDIA technology.

The P40 balances computational precision and throughput, on-chip memory and memory bandwidth to achieve unprecedented performance for training, as well as inferencing. For training, P40 has 10x the bandwidth and 12 teraflops of 32-bit floating point performance. For inferencing, P40 has high-throughput 8-bit integer and high-memory bandwidth.

The updated chart is worth looking at, but one of the main takeaways is 2x inferencing performance at 3x the power usage. For workstation builds that seems like a fair tradeoff (especially since you can’t go out and buy a Google TPU for yourself), but in the data center this appears to confirm Google’s argument that it helped them build fewer data centers (lower power = less heat = higher density).

In broader terms, its been neat over the last 10 or so years seeing GPU’s being used (and bragged about) for more than pushing pixels. I think back to Standford’s Folding@Home project and what a boon that video cards with programmable pipelines became to mapping out proteins. Deep learning is now bringing about changes in how graphics cards are designed, pretty amazing.