The Human Brain is considered by many people to be a sophisticated “machine”. Amazed by its wide range of capabilities, we do not yet know its limits. Studies on the mechanisms of the brain so far have been fruitful, yet we still need to put in a lot more effort in order to assemble them all together into a complete picture. Even though the exact workings of the brain still remain a mystery, it has already inspired designs, applications, and algorithms in the field of machine learning and artificial intelligence. In this paper, the authors adopted an interesting viewpoint — instead of investigating what steps the brain takes to process information, they chose to link brain activity directly to the learning process.
The authors’ strategy was straightforward: if we are unable to reveal the stories inside brain for now, let’s forget about them. Instead, we could simply bias the solution of a machine learning algorithm, so that it more closely matches the internal representations found in the visual cortex. It seems like an effortless idea, although no one has ever tried it for a long time. This concept is not created out of thin air, either, for previous studies have constrained learned models through human behavior, and a method to link images to corresponding brain activities in EEG recordings has been demonstrated.
Instead of EEG recordings, the authors chose fMRI recordings of the visual cortex, possibly because fMRI recordings have higher spatial accuracy, since fMRI recordings are better at showing the details of activities inside different brain areas by measuring the blood flow inside the brain. Through training supervised classification models in visual object categorization, and weighting individual training images by values derived from fMRI recordings, when the subject is viewing the same image, the authors are able to obtain baseline models, since these models classify images without the benefit of fMRI data.
fMRI was used to record BOLD voxel responses (which is basically Deoxyhemoglobin/Oxyhemoglobin ratio, since brain activity consumes oxygen) of one subject viewing 1, 386 color images of natural scenes. After the fMRI data is processed, response amplitude values for 67,600 voxels were available for each image. From this set of voxels, 3,569 were labeled as concerning one of thirteen visual regions of interest (ROIs) from areas in charge of sketchy to higher-level visual processing. In this experiment, the authors focused on seven such regions associated with higher-level visual processing, for use in object classification tasks probing the semantic understanding of visual information: extrastriate body area (EBA), fusiform face area (FFA), lateral occipital cortex (LO), occiptal face area (OFA), parahippocampal place area (PPA), retrosplenial cortex (RSC), transverse occipital sulcus (TOS). 1, 427 out of 3,569 voxels belong to these regions.
The entire experiment is divided into two phases:
1. Subtract per-stimulus “activity weights” from the fMRI data.
2. Train image classifiers.
In the first phase, activity vectors of ROIs for each stimulus were collected through fMRI recordings. These activity vectors and the activities were then split into training and test sets. A support vector machine (SVM) classifier was trained for a given binary classification task category. The last step was to transform the classification scores into a probability value via a logistic function, so that activity weights could be generated.
Based on the generated activity weights, two classification models on image features of the visual stimuli were trained in Phase II. The information provided in the training sessions was the activity weights. In testing, the two trained models would make predictions solely based on image features. The difference between two SVM classifiers was that one classifier used a loss function (e.g., hinge loss) that equally weights the misclassification of all samples as a function of distance from the SVM’s own decision boundary, while the other classifier used a modified loss function (e.g., activity weighted loss) that penalizes the misclassification of samples with large activity weights more aggressively.
In machine learning, hinge loss function is used to penalize misclassified data points, and the penalty assigned is proportional to how erroneous the prediction is. However, neural activities in the brain are never a single “prediction”. A specific stimulus usually corresponds to a pattern of activation in a region, and it is not hard to understand that, with a strong response, it is easier to recognize such a stimulus has been given. On the other hand, if the activation in a certain region is weak, the stimulus is more difficult to recognize. Considering this characteristic, activity weight loss (AWL) function was modified based on the hinge loss function, and proportionally penalize misclassified training samples based on the inconsistency with the evidence of human decision making found in the fMRI measurements. A large activity weight means there is strong neural response to the visual stimuli; thus AWL would give higher penalty if the human subjects misclassify.
The hypothesis of bridging neural data to machine learning process for benefit is very interesting. Will
the performance improve after the “help” of activity data from the brain, even if it is “coded”? The authors evaluated the two models from two aspects: improvement in baseline performance, and different degrees of improvement in regions of interest (ROIs) in the brain.
Improvements in Model Performance
The authors considered two image features in the performance evaluation: Histogram of Gradients (HOG) and Convolutional Neural Network (CNN). HOG corresponds to V-1 like features, or the “primitive” features, since V-1 is the area in charge of first and the most rough processing of visual information. CNN corresponds to more sophisticated processing, or the learned features representations in the visual ventral stream. Neuroscientists have proposed two streams in the visual processing in the human brain: The ventral stream (“what” pathway), involving a series of areas leading to object recognition. The dorsal stream (“where” pathway), involving more about spatial understanding and motion detection.
Comparisons between the two models show promising results. In graph (A) and (B) below, significant improvements can be seen in all of the four object categories (p < 0.01 via paired, one-tailed t-testing) when AWL was adopted. The accuracy of classification underwent much greater improvements under AWL in HOG features compared to CNN features. This suggests that using brain activity data might benefit the learning of more “primitive” features, compared to higher-level, more sophisticated visual information.
Limiting experiments to specific ROIs gives even more interesting results. Extrastriate body area (EBA), fusiform face area (FFA) and parahippocampal place area (PPA) are areas thought to respond to different visual cues of object recognition. EBA will light up when objects contain body parts are recognized. FFA is considered to be involved in face recognition experiments, especially human faces. PPA has been demonstrated to respond more strongly in fMRI to scenes depicting places than to other kinds of scenes. According to results shown in (C), the two categories that contain body parts – people and animal – received the most impressive reduction in error rate. For buildings, there is also a small reduction in error rate, but not that significant. Results in (D) and (E) also display statistically significant improvements in categories corresponding to the specific ROIs in the brain, suggesting that the brain activities do not benefit machine learning in a random way. The activity patterns of brain found in neuroscience (e.g. FFA light up when subjects see human faces) improve the performance of machine learning in a regulated way.
Analysis of ROIs
To further test the significance of individual ROIs in generating salient activity weights that improve classification accuracy rates, the authors generated 1,000,000 samples with a null distribution for each object category and set of image features. Each sample reflects how often a random set of 64 ROI combinations would have an above-average classification accuracy. If a ROI did not contribute to the improvements of accuracy rates observed, above-average accuracy rates of combinations that contain this ROI would fall near the mean of the null distribution.
As shown in the figure below, EBA plays a very important role in the above-average accuracy rates in both the category of people and animals. In fact, the EBA region dramatically exceeds the significance thresholds (95% confidence interval), which is a rigorous confirmation of its significance. Interestingly, FFA does not go over the significant threshold, although it is usually thought to be tightly connected to human faces.
The results of buildings and foods are harder to interpret. PPA does not only improve he accuracy of the classification of buildings, but also the result of food. Other areas such as EBA, OFA and FFA are also shown to contribute to the improvements, although not that significant. The reasons behind this observation might be complicated. First, activations of brain areas are never refined. Most of the time, scientists are just looking at “the most activated area” instead of taking all the areas that “simply activated”. It might also be because buildings and food are objects subject to large changes: in HOG, only unsophisticated features are focused. To simplify, HOG might be better at recognizing objects that do not change much in appearance. Visual analysis of categorizing same object with distinct appearance is the function of upper-level layers, thus might be reflected better in CNN.
While this article focused on visual object recognition and fMRI data, the framework can indeed be extended to other modalities such as auditory and somatosensory, other neuroimaging techniques, and different machine learning algorithms. The significance of this approach is that it reveals many opportunities for closer interaction between machine learning and neuroscience.
Other than the implications mentioned in the paper, brain-computer interface applications might also obtain some inspirations. There are already several studies investigating how to incorporate brain activity data into machine learning process to optimize interactions with users. For example, conjecture of using neural data to realize translation of text from brain waves has been supported by some primary experiments. With the support of this study, researchers could be more confident in the contributions of neural data to the training process, even though the exact connections between thinking a specific word and the generated brain waved cannot be explained.
In other applications, real-time monitoring of brain activities could be used as a method to adjust the process of machine learning, so that a more flexible, more humanized algorithm could be realized. For example, virtual reality (VR) technique has been applied in the treatment of autism, and has proven to be effective by literature. As learning is a comparative difficult process for most autistic children, and these children usually lose what they learned without appropriate consolidation in time. Thus, real time connection of brain data and machine learning might be beneficial both in time and effort.
- The Parahippocampal Place Area (http://www.cell.com/neuron/fulltext/S0896-6273(00)80758-8?_returnURL=http%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0896627300807588%3Fshowall%3Dtrue&cc=y)
- Using Human Brain Activity to Guide Machine Learning (https://arxiv.org/abs/1703.05463)
- Two-streams hypothesis (https://en.wikipedia.org/wiki/Two-streams_hypothesis)
Author: Yujia Liu | Editor: Joni Zhong