Site icon Synced

Neural Networks for Beginners

Source: https://arxiv.org/abs/1703.05298

This review will cover three commonly used machine learning frameworks: Matlab Neural Network Toolbox, Torch, and Tensorflow. The goal is to provide a basic introduction to these frameworks, and compare the advantages/disadvantages between them.

Matlab Neural Network Toolbox

One of Matlab‘s biggest advantage is its user-friendly, interactive interface, which makes it extremely easy to manipulate, analyze, and visualize various types of data.

Out of the various Matlab packages, the Statistic and Machine Learning Toolbox and Neural Network Toolbox are designed for deep learning. The following built-in functions will be mostly based on Neural Network Toolbox package.

The nnstart function is used to start building a network structure. For example, if the user wants to load some matrices of data and send them to the network, this function should be the first to be called. Then, the feedforwardnet function can be used to realize a forward pass calculation in order to get a objective function (e.g. Error). If you want to build a network and run it in GPU mode, then during the training the argument useGPU should be passed into the train function, for example: nn = train(nn, … , ‘useGPU’, yes).

Basically, once the training data X and correspond labels Y is built, the training can be started by using the aforementioned train(nn, X, Y) function. This a full-batch case. In practice, there are also batch-mode implementation available, where the number of batches should be calculated first, and then iterated over the number of batches to get the corresponding training data matrices and targets, and then send them to the train() function same as before. During training, there are some configurable hyper-parameters to set, for example: set nn.trainParam.lr = 0.01, which means the initial learning rate of the network is set to be 0.01.

For the optimization problem, the perform(nn, Y, f) function can be used according to the corresponding loss function, where nn is the defined network, Y is the ground truth target and f is the forward results.

Matlab is also very powerful on the visualization and analysis of data. The official documentation provides several useful built-in functions for data visualization, for example, the grid of the input space can be generated by calling meshgrid function, drawing the separation surface by calling contour function, and drawing data points by calling scatter function.

Torch

Torch is a numerical scientific computing library based on LuaJIT, and its nn package is a neural net library on top. The huge advantage of LuaJIT over Python is its speed, which is much faster and simpler. Compared to Caffe [1], which depends on tons of dependencies and hard to install, the Torch library is designed to be very clean, making it easy to use and to extend. The back-end behind Torch is written in C, so the overall performance is also pretty good. It has its own GPU computing libraries named cutorch and cunn, which allows it to do CUDA computing.

Before starting Torch, you should learn Lua, which acts as a scripting language on top of Torch. In terms of syntax, it is very similar to python. For example, the only data structure used in Lua is table, which can be written as {}. The table structure is simple but powerful; basically, you can use it to replace almost any kind of data structure in python, let’s say, list [] , dict {}, object or class. The default variable is set as global variable, so whenever possible, you should declare variable as local to avoid some variable mix problems. In Lua, there are 7 different data types, namely nil, boolen, number, string, function, userdata and table, you can pass any of them to the table structure. The detailed use of Lua can be found in the official website.

The data format in Torch is called torch.Tensor(), which acts as an N-D array structure. It’s a straightforward extension of Lua’s table. Like numpy’s array, you can use torch.Tensor() to manipulate any mathematical operations.

For example, in the above figure, t1 is defined as a tensor but with unknown size, t2 is a 4×3 matrix filled with 1, and t3 is an 3×5 identity matrix, then torch.mm() executes a matrix multiplication between t2 and t3, and passes the result to t1; finally a real number 5 is passed into the first row, second column of t1.

The nn package of Torch is used for network construction. Each file inside nn package is inherited from nn.Module class, which designs output and gradInput for forward and backward pass. So in the training phase, you need to call network:forward() and network:backward() to realize forward pass and backward propagation. When you want to design a network by yourself, there are two steps: the first thing is to write a module:updateOutput(input) function to realize forward phase, and then write a module:updategradInput(input, gradOutput) to realize backward propagation. If the network itself has trainable parameters, you also need to write a module:updateGradParameters(input, gradOutput) to calculate the gradient w.r.t. the parameters. Torch nn library has several build-in functions like Simple layers (e.g. nn.Linear), Transfer function (e.g. nn.ReLU()), Criterion (e.g. nn.MSECriterion) and Containers (e.g. nn.Sequential). If you want to put the tensor into CUDA computing, just write tensor:cuda() and it will automatically be transferred into GPU.

For a training there are some functions need to write:

Due to the accumulated parameters mechanism, the gradParameters should be reset for each batch, which can be easily ignored in practice. Also, Torch provides a scientific visualization package namely gnuplot and iTorch for interactive visualisation. For example, gnuplot.plot() for plotting and gnuplot.grid(true) to make a grid background. Recently, Facebook announced their new visualization tools visdom, which also supports Lua.

Tensorflow

At the end of 2015, Google announced their open-source product TensorFlow, which is a general machine learning library for numerical computing and programming. TensorFlow is written based on the concept of “computational graphs”, where the nodes within the graph corresponds to the mathematical operator, and the edges within the graph corresponds to the computed multi-dimensional tensor. Based on this kind of symbolic approach, it makes it possible to only write the forward pass and let the network calculate the gradient automatically. The back-end of TensorFlow is written in C++, which guarantees the speed and efficiency of its use. It also provides a Python API on top, which allows users to quickly build their customized model. Python is a general oriented object programming language, and it’s easy to learn and mainly be used for scientific computing. Python can be executed on the majority of operating systems such as Windows, Linux and Mac OS X.

The main component of TensorFlow includes computational graph, tensor, variable, optimizer and session. As mentioned above, computational graph is a collection results from dynamic numerical computation operations, where each node describe the math operations and edges describe the immediate output. This way, it allows for parallel computation in multi-core CPU or GPU cluster once it has been launched. Tensor is an n-dimensional array, similar to an N-D array in numpy. Variable is a symbolic concept for parameter, which can compute the derivative with respect to the parameters in terms of symbolic level, but it should be initialized before launching the session. Optimizer provides a series of classic optimization methods to calculate the gradient of the defined loss function with respect to the network input to provide updates to the parameters in the network. Session is a launch procedure that must be executed in CPU or GPU to run the real-valued computation.

The above figure is to allocate a placeholder for inputs and targets, this is a symbolic design to construct a computation graph. Later, in the session launch part, it will be instantiated with real-valued inputs.

For training a customized model based on TensorFlow, there are some basic steps one need to follow

TensorFlow provides its own visualization tools for scientific visualization, namely Tensorboard, it is also common to use python plotting package matplotlib for visualization, because tf.tensor() can be converted to np.array() directly.

Comparison

Generally, the above frameworks are all very useful machine learning tools to deal with numerical computing. Basically, it entirely depends on what do you think is “better”. I used both TensorFlow and Torch, so here I mainly list some of my personal preferences to compare the pros and cons between them:

In summary, each library has its own pros and cons, and their maintainers try to make things better and more popular. I would say Torch and TensorFlow are two of most popular deep learning frameworks so far, and they do contribute a lot to the machine leaning open source community.

[1] Caffe: Convolutional Architecture for Fast Feature Embedding. Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor.

 


Author: Shawn Yan|Editor:  Junpei Zhong Localized by Synced Global Team: Xiang Chen

Exit mobile version