AI Technology

Google Cloud TPUs Now Speak Julia

A new paper from Julia Computing Co-Founder and CTO Keno Fischer and Senior Research Engineer Elliot Saba introduces a method and implementation for offloading sections of Machine Learning models written in Julia programming language to TPUs.

A new paper from Julia Computing Co-Founder and CTO Keno Fischer and Senior Research Engineer Elliot Saba introduces a method and implementation for offloading sections of Machine Learning models written in Julia programming language to TPUs.

Tensor Processing Units (TPUs) are Google’s custom-developed Application Specific Integrated Circuit (ASICs) used to accelerate machine-learning workloads. Google Cloud TPUs are the cutting edge hardware architecture for training today’s computationally demanding deep learning and machine learning models. A Google Cloud TPU machine learning accelerator was first made available to the public in 2017. Fischer and Saba’s method works by leveraging the Lower Level XLA (Accelerated Linear Algebra) Compiler that Google released in August 2018.

Mapping Julia Semantics to XLA

In order to offload Julia code to TPU, Julia code must be compiled to XLA code. To achieve this, the Julia compiler needs to bridge the gap between the dynamic semantics of the language and the static semantics of the LLVM (Low Level Virtual Machine) representation. If we can find a way to convert Julia code to XLA “High Lever Optimizer” (HLO) input language, then Julia can function on TPUs.

Julia programs are written in terms of functions and abstractions provided by Julia’s base library and use a multiple dispatch method, which provides the possibility of expressing their own operations in term of HLO operations. A few examples of this are shown below:
image.png
The paper also provides implementations of the higher level array abstractions, in particular, mapreduce and broadcast. Normally the HLO operation of a broadcast implementation is around 20 lines of code and omitted for space, but the implementation of ‘mapreduce’ is simply:
image (1).png

Evaluation on TPUS

To demonstrate that the Julia compiler is able to work on TPUs with no major issues, the paper includes examples such as VGG19 (Visual Geometry Group 19 convolutional neural network architecture) Forward Passing, and VGG19 Backward Passing. Below are some results with notes excerpted from paper:

image (2).png
“Timings for the VGG19 forward pass for varying batch sizes. Flux CPU is Flux master/Julia master without the XLA compiler. PyTorch CPU is the equivalent model in pytorch on the same CPU. FluXLA CPU is our work against an xrt implementation running on the CPU, FluXLA TPU (total) is end-to-end time as reported by the client (including kernel launch overhead and data transfer back from Google Cloud over the internet – note that as a result of the additional network transfer this measurement had significant variability), FluXLA TPU (compute) is the total compute time on TPUs as reported by the cloud profiler (unlike the total time measurement, this measurement was very stable). All CPU measurements on Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz CPUs supporting AVX512. Up to 20 cores were available and CPU benchmarks were not restricted to a single core (though in practice not all CPU benchmarks actually used the available parallism). TPU benchmarks were restricted to a single TPU core. All Timings are minimum of 4 runs (except FluXLA CPU for N=100 which failed to finish a single run within 10 minutes).”
image (3).png
“Breakdown of instruction counts of the Metalhead.jl VGG19 forward pass and backwards pass after compilation to XLA. Both unoptimized (after the Julia frontend) and optimized counts (after an XLA optimization pipeline similar to that used by the CPU backend, but without HLO fusion) are shown. For each, the count is further broken down into instructions in the entry computation (E) and instruction counts in all computations (T)”

The new method has been welcomed by ML researchers and garnered praise from Google AI Lead Jeff Dean, who tweeted “Julia + TPUs = fast and easily expressible ML computations!”

The paper Automatic Full Compilation of JULIA Programs and ML Models to Cloud TPUs is on arXiv.


Author: Robert Tian | Editor: Michael Sarazen

0 comments on “Google Cloud TPUs Now Speak Julia

Leave a Reply

Your email address will not be published.

%d bloggers like this: