Neural Networks for Beginners

Synced

9 years ago

Source: https://arxiv.org/abs/1703.05298

This review will cover three commonly used machine learning frameworks: Matlab Neural Network Toolbox, Torch, and Tensorflow. The goal is to provide a basic introduction to these frameworks, and compare the advantages/disadvantages between them.

Matlab Neural Network Toolbox

One of Matlab‘s biggest advantage is its user-friendly, interactive interface, which makes it extremely easy to manipulate, analyze, and visualize various types of data.

Out of the various Matlab packages, the Statistic and Machine Learning Toolbox and Neural Network Toolbox are designed for deep learning. The following built-in functions will be mostly based on Neural Network Toolbox package.

The nnstart function is used to start building a network structure. For example, if the user wants to load some matrices of data and send them to the network, this function should be the first to be called. Then, the feedforwardnet function can be used to realize a forward pass calculation in order to get a objective function (e.g. Error). If you want to build a network and run it in GPU mode, then during the training the argument useGPU should be passed into the train function, for example: nn = train(nn, … , ‘useGPU’, yes).

Basically, once the training data X and correspond labels Y is built, the training can be started by using the aforementioned train(nn, X, Y) function. This a full-batch case. In practice, there are also batch-mode implementation available, where the number of batches should be calculated first, and then iterated over the number of batches to get the corresponding training data matrices and targets, and then send them to the train() function same as before. During training, there are some configurable hyper-parameters to set, for example: set nn.trainParam.lr = 0.01, which means the initial learning rate of the network is set to be 0.01.

For the optimization problem, the perform(nn, Y, f) function can be used according to the corresponding loss function, where nn is the defined network, Y is the ground truth target and f is the forward results.

Matlab is also very powerful on the visualization and analysis of data. The official documentation provides several useful built-in functions for data visualization, for example, the grid of the input space can be generated by calling meshgrid function, drawing the separation surface by calling contour function, and drawing data points by calling scatter function.

Torch

Torch is a numerical scientific computing library based on LuaJIT, and its nn package is a neural net library on top. The huge advantage of LuaJIT over Python is its speed, which is much faster and simpler. Compared to Caffe [1], which depends on tons of dependencies and hard to install, the Torch library is designed to be very clean, making it easy to use and to extend. The back-end behind Torch is written in C, so the overall performance is also pretty good. It has its own GPU computing libraries named cutorch and cunn, which allows it to do CUDA computing.

Before starting Torch, you should learn Lua, which acts as a scripting language on top of Torch. In terms of syntax, it is very similar to python. For example, the only data structure used in Lua is table, which can be written as {}. The table structure is simple but powerful; basically, you can use it to replace almost any kind of data structure in python, let’s say, list [] , dict {}, object or class. The default variable is set as global variable, so whenever possible, you should declare variable as local to avoid some variable mix problems. In Lua, there are 7 different data types, namely nil, boolen, number, string, function, userdata and table, you can pass any of them to the table structure. The detailed use of Lua can be found in the official website.

The data format in Torch is called torch.Tensor(), which acts as an N-D array structure. It’s a straightforward extension of Lua’s table. Like numpy’s array, you can use torch.Tensor() to manipulate any mathematical operations.

For example, in the above figure, t1 is defined as a tensor but with unknown size, t2 is a 4×3 matrix filled with 1, and t3 is an 3×5 identity matrix, then torch.mm() executes a matrix multiplication between t2 and t3, and passes the result to t1; finally a real number 5 is passed into the first row, second column of t1.

The nn package of Torch is used for network construction. Each file inside nn package is inherited from nn.Module class, which designs output and gradInput for forward and backward pass. So in the training phase, you need to call network:forward() and network:backward() to realize forward pass and backward propagation. When you want to design a network by yourself, there are two steps: the first thing is to write a module:updateOutput(input) function to realize forward phase, and then write a module:updategradInput(input, gradOutput) to realize backward propagation. If the network itself has trainable parameters, you also need to write a module:updateGradParameters(input, gradOutput) to calculate the gradient w.r.t. the parameters. Torch nn library has several build-in functions like Simple layers (e.g. nn.Linear), Transfer function (e.g. nn.ReLU()), Criterion (e.g. nn.MSECriterion) and Containers (e.g. nn.Sequential). If you want to put the tensor into CUDA computing, just write tensor:cuda() and it will automatically be transferred into GPU.

For a training there are some functions need to write:

network:forward(input)
criterion:forward(input, target)
gradOutput = criterion:backward(input, target)
network:zeroGradParameters()
network:backward(input, gradOutput)

Due to the accumulated parameters mechanism, the gradParameters should be reset for each batch, which can be easily ignored in practice. Also, Torch provides a scientific visualization package namely gnuplot and iTorch for interactive visualisation. For example, gnuplot.plot() for plotting and gnuplot.grid(true) to make a grid background. Recently, Facebook announced their new visualization tools visdom, which also supports Lua.

Tensorflow

At the end of 2015, Google announced their open-source product TensorFlow, which is a general machine learning library for numerical computing and programming. TensorFlow is written based on the concept of “computational graphs”, where the nodes within the graph corresponds to the mathematical operator, and the edges within the graph corresponds to the computed multi-dimensional tensor. Based on this kind of symbolic approach, it makes it possible to only write the forward pass and let the network calculate the gradient automatically. The back-end of TensorFlow is written in C++, which guarantees the speed and efficiency of its use. It also provides a Python API on top, which allows users to quickly build their customized model. Python is a general oriented object programming language, and it’s easy to learn and mainly be used for scientific computing. Python can be executed on the majority of operating systems such as Windows, Linux and Mac OS X.

The main component of TensorFlow includes computational graph, tensor, variable, optimizer and session. As mentioned above, computational graph is a collection results from dynamic numerical computation operations, where each node describe the math operations and edges describe the immediate output. This way, it allows for parallel computation in multi-core CPU or GPU cluster once it has been launched. Tensor is an n-dimensional array, similar to an N-D array in numpy. Variable is a symbolic concept for parameter, which can compute the derivative with respect to the parameters in terms of symbolic level, but it should be initialized before launching the session. Optimizer provides a series of classic optimization methods to calculate the gradient of the defined loss function with respect to the network input to provide updates to the parameters in the network. Session is a launch procedure that must be executed in CPU or GPU to run the real-valued computation.

The above figure is to allocate a placeholder for inputs and targets, this is a symbolic design to construct a computation graph. Later, in the session launch part, it will be instantiated with real-valued inputs.

For training a customized model based on TensorFlow, there are some basic steps one need to follow

Construct a real-valued input data, which can be numpy array or python list.
Define placeholder to represent data in the computational graph, it should be initialized with shape and data type, and later during the session launch, it will be filled with real-valued input data.
Define variables to represent network parameters. They must be initialized and work as tensors in the graph. This way, the network can be fully configured as a symbolic computing graph.
Define a network by using various build-in tf functions.
Define a loss function and an optimizer, a lot of optimization settings can be put here. For example, SGD optimizer, Adam optimizer.
Start a session. This step is very important, because so far the computation graph only exists at a symbolic level. After this step, it will be instantiated with feed real-valued inputs. For example, sess = tf.Session() is to create a session, and sess.run(tf.initialise_all_variables()) is to initialism all variables and the data flow graph will move to CPU/GPU.
Use sess.run(optimiser, feed_input) for training. The feed_input is a dict {} format, which initialises the placeholder with real-valued inputs.
Evaluation step.

TensorFlow provides its own visualization tools for scientific visualization, namely Tensorboard, it is also common to use python plotting package matplotlib for visualization, because tf.tensor() can be converted to np.array() directly.

Comparison

Generally, the above frameworks are all very useful machine learning tools to deal with numerical computing. Basically, it entirely depends on what do you think is “better”. I used both TensorFlow and Torch, so here I mainly list some of my personal preferences to compare the pros and cons between them:

Speed: Matlab has a heavy interface, so it should be the slowest. When you are dealing with TensorFlow, the compilation time is required. So whenever you debug the code, you should wait for a while before the symbolic computation graph has been compiled. Torch has no compile time, so that the code can be run directly, which is a big advantage. As far as I know, Torch is the fastest GPU deep learning library with cudnn support.
Build-in functions: Matlab provides many simple API functions with easy access, which should be the easiest tool for beginner. TensorFlow provides many miscellaneous functions like tf.contrib and tf.layers, and the functions setting in TensorFlow is really concrete, which means you should consider every detail very well. It is really hard to master all of the original functions in TensorFlow .
Symbolic Calculus: There is no doubt that TensorFlow is the best choice, especially at the 2nd order gradient differentiation. Torch itself does not support symbolic computation, which will sometimes cause some problems. For example, if you consider containers in Lua Torch, you convolve some layer A and then upsample it and concatenate with layer A and do something with the results, the problem will occur that you don’t know exactly where the gradients flow because of the containers design. With TensorFlow you can check each symbolic and its gradient transparently with session.eval(). Recently, Torch announces their autograd package designed by Twitter to fix this issue.
Visualization: TensorFlow and Matlab are better. Matlab itself is a powerful tool for matrix visualization. TensorFlow has a better computational graph visualization based on tensorboard, and can also benefit from python’s plotting package matplotlib. In comparision, Torch is poor in visualization, which only supports iTorch for interactive visualization and gnuplot for plotting curves.
Programming Language: Python is better. It has a better string library compared to Lua and matlab, and a lot more libraries for different purposes due to its huge community. Compared to Python, Matlab is the simplest one, and you can check the matrix directly from screen. Lua is much easier to interface with C (and C++, CUDA), which is the main reason why torch maintainers decided to use Lua in the first place.
Community: Torch and TensorFlow are better: Torch is mainly maintained by Facebook, Twitter and Nvidia, and TensorFlow is maintained by Google and various companies. Matlab is more defined as a R&D framework and not a deployment framework.

In summary, each library has its own pros and cons, and their maintainers try to make things better and more popular. I would say Torch and TensorFlow are two of most popular deep learning frameworks so far, and they do contribute a lot to the machine leaning open source community.

[1] Caffe: Convolutional Architecture for Fast Feature Embedding. Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor.

Author: Shawn Yan|Editor: Junpei Zhong | Localized by Synced Global Team: Xiang Chen

Matlab Neural Network Toolbox

Torch

Tensorflow

Comparison

Share this: