Last month’s ReWork Deep Learning Summit in London provided a peek at current recent research progress and future trends in artificial intelligence technologies. The two-day event featured top scientists and engineers from Facebook, MIT Media lab, DeepMind and other leading institutes.
The first speaker was Facebook Software Engineer Fabrizio Silvestri, who graduated summa cum laude in computer science from the University of Pisa. Silvestri worked at Yahoo Labs Barcelona in Spain from 2013 – 2015 before joining the Facebook Search Systems team in London, UK. His research focuses on data and web mining, web searching, Big Data, Information Retrieval and Computational Advertising.
In his presentation Silvestri explained that Facebook develops content research for four main reasons: Unique mobile content and footprint, social graphs, authentic identities, and interest personalization. The social media giant uses large neural networks to free humans from tedious hand-crafted feature engineering. Their method for mapping search queries into vector space embeddings for example has proven to work well in scenarios where feature design is difficult, such as with images and videos.
Silvestri suggested the use of embeddings of search queries minimizes out-of-vocabulary (OOV) problems. The model is trained using triplet loss and the best performing encoder is an average of embeddings for word unigram + bigram + three to five char-ngrams following a fully connected network.
Click here for more on Silvestri’s research and publications: http://pomino.isti.cnr.it/~silvestr/
Next up was Àgata Lapedriza, a professor at Universitat Oberta de Catalunya and visiting researcher at the Massachusetts Institute of Technology (MIT) Media Lab. She is also a member of the BCN Perceptual Computing Lab and the Computer Vision Centre. Lapedriza’s research involves image understanding, scene recognition and characterization, and affective computing.
Lapedriza pointed out the importance of scene understanding technologies for automatic emotion recognition, a challenging task she has been exploring over the past ten years. Automatic recognition of emotions has a number of applications in environments where machines collaborate with humans. Commercial software is available for emotional recognition of facial expressions, as most research on image sentiment recognition has focused on human faces.
The context of an image however is also fundamental to understanding people’s emotional states. Lapedriza presented a Scene Recognition Demo. Given a picture, the CNN-based system predicts the scene category and other attributes and provides a heatmap indicating regions in the image that support the output.
Click on the link to view the demo: http://places2.csail.mit.edu/demo.html.
Lapedriza’s recent papers include Emotion Recognition in Context (accepted at CVPR 2017), and Places: A 10 Million Image Database for Scene Recognition.
The third speaker was DeepMind Senior Research Scientist Raia Hadsell, who also works at SRI International and the Vision and Robotics group in Princeton. Hadsell’s research interest includes challenges in Artificial General Intelligence (AGI) such as continual and transfer learning, deep reinforcement learning, and neural networks for navigation.
Hadsell’s presentation focused on Deep Reinforcement Learning in Complex Environments. Her team proposed an end-to-end deep reinforcement learning approach which enables computers to learn to navigate cities. Tasked with traversing to target destinations kilometres away, the model cannot access a map and is not given its current location — it must infer its positions entirely from Google Street View photographic content of the city.
Hadsell’s paper Learning to Navigate in Cities Without a Map is on arXiv.
Qiang Huang from the Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey, works on multimodal information processing using deep neural networks.
His presentation was on image synthesis using Two-Stage Generative Adversarial Networks (GANs), in this case to draw colorful pictures of birds using only a small amount of training data. Huang’s research team simulated the procedure of drawing a picture: from making outlines to drawing contours and edges of an object, to adding different colors and shades. The first GAN model generates object shape and the second paints a monochrome image and so on. The approach can generate synthetic images with quality comparable to real ones.
Click here to read Huang’s paper Synthesis of Images by Two-Stage Generative Adversarial Networks.
This was the 4th annual European edition of the ReWork Deep Learning Summit. Upcoming ReWork events include a Deep Learning Summit and AI for Government Summit in Toronto this October 25 & 26; and the Machine Learning for DevOps and Applied AI Summit in Houston November 29 – 30.
Author: Zi Shao | Editor: Tony Peng, Michael Sarazen