RE·WORK: Deep Learning in Retail Summit (London, UK)

RE•WORK is one of the top global Machine Intelligence industry conference/summit organizers. The aim of Rework: Combining entrepreneurship, technology & science to re-work the future.


RE•WORK is one of the top global machine intelligence industry conference/summit organizers.

The aim of Rework: Combining entrepreneurship, technology & science to re-work the future.

Time: 6.1-6.2 2017

Location: ETC Venues 155 Bishopsgate Liverpool St London EC2M 3YD

Introduction for this Rework conference:

In general, Rework summit invites extraordinary speakers to discover advances in deep learning and smart artificial intelligence from the world’s leading innovators while showcases the opportunities of advancing trends in deep learning and their impact on business & society. A plenty of remarkable data scientists, machining learning scientists and related entrepreneurs will attend this summit to know people who have the similar interests and discuss technology shaping the future.

Titled Discover the latest deep learning advancements and how to leverage methods to improve advertising and the retail experience, the summit attracted companies such as IBM, Amazon, etc. The topics include Deep Learning Trends and Customer Insight, Forecasting and Recommendations, Warehouse and Stock optimization and Computer Vision and Image Recognition. Many of them are from startups, which are quite interesting and energetic in the conference. (

This is a compact report for this summit including the introduction for speakers and content of each talk.

On June 1st, there are ten speakers talking about the application of machine learning/deep learning in the retail field.

The first speaker is Ben Chamberlain, the senior data scientist from ASOS.

image (1).jpeg

In this talk, Chamberlain started with the concept of customer lifetime value. Customer Lifetime Value means how much value can be created by a customer during his/her entire relationship with the company [1]. He proposed two methods: random forest and the comparison between RF, DNN and logistic regression both over the efficiency and cost. Then he offered a solution to calculate the CLTV using a wide & deep model which consists of logistic regression and deep neural network. After that, he introduced embedding and hyperbolic space, which are used in ASOS.

The second speaker is Kumar Ujjwal, the senior product manager of big data & machine learning at Kohl’s Department Stores. He shared research about computer vision and natural language processing that encourages a customer to make smart decisions during shopping. The talk of Kumar was divided into two parts: the concept of micro-moment trend in retail, leverage big data and machine learning micro moments. He talked about the importance of micro-moment and what people are thinking about when they search online especially with a phone. Since people always search online before they step into the store, what customers do online could offer much information for the company to make analysis and then make a better recommendation for each user. What they basically do is making product discovery more natural and easier via natural language search, visual search and personalized content for customers. They use user behavioral analytic tools supported by big data and machine learning to make recommendations for users. They also built their own system called AI First Decision Making Approach to provide personalized experience in real-time.

image (2).jpeg

The third speaker is Jan Gasthaus, the machine learning scientist from Amazon. He showed their autoregressive recurrent networks to predict the future probabilistic distribution of items based on the past data. This is a paper on If we could know the distribution of items in the future, we can make use of our resources more reasonable such as reducing excess inventory in supply chain. A traditional approach like box-Jenkins or state-space models requires lots of manual work by experts, which cannot learn patterns across time series. Then he introduced feed-forward neural network. This composes complex black-box functions from simpler building blocks and learns them  with end-to-end, but the outputs are not correlated across time, which is not good for this kind of time-series prediction task. Therefore, they argued to use autoregressive recurrent networks with LSTM (long-short-term-memory) cell to solve this problem. The model he designed could be applied to forecast yields flexible, accurate and scalable forecasting system. Besides, it can learn complex temporal patterns across time series.The paper is available on Arxiv:

image (3).jpeg

Next speaker is Rami AI-Salman, the data scientist & machine learning engineer from Trivago. He talked about their methods to make a good hotel recommendation both for text searching and image searching. Whichever people input text or image, the search engine could give right results for their queries. He started with artificial neural networks, the basic neural network model then turned to word embedding. He talked about two models for representing the words as vectors: Skip-Grams and Continues Bag of Words. Finally, he introduced their deep artificial neural networks based on a paper DeepTags: Integration of Various VGI Resources Towards Enhanced Data Quality. Their product could be used on the website via googling trivago, but it just supplies text search yet.

image (4).jpeg

The fifth talk is about an online shopping company which called Picnic. Daniel Gebler, the CTO of Picnic, told us they use machine learning to do the customer behavioral analysis and prediction engine. He started with two challenges of bulk recommendations precision and seasonal variation. Then he demonstrated the formalization of this question and how to solve it by the neural network. His talk is mainly focused on big data (bread range for many different customers) and deep data (the history of orders of one customer) using LSTM RNN and RFM-based (recency, frequency, month) strategy to preprocess data. Because of the shopping data is special and different season will reflect different features of data, human preprocessing data could improve the model.

image (5).jpeg

The sixth speaker, Calvin Seward, who is the research Scientist from Zalando, showed us the best way to pick stuff in a warehouse. He demonstrated three key issues: Picker routing problem, Order batching problem and Neural Network Estimate of Pick Route Length. They developed an algorithm called OCaPi and calculate it using Convolutional Neural Network with ReLUs. In the future, he might use reinforcement learning to solve batching problem because it could also be seen as a game like Go.


image (6)

The next one is Pau Carre Cardona from Gilt. His topic is deep learning for product faceting and similarity using product image and text description separately. He introduced the automated faceting mechanism they are using to improve Dataset Quality. Then he talked about the ResNet and spatial transformer, which could be used to locate features in the product image. After that, he turned to text description and talked about dilated convolutions to replace Recurrent Neural Networks. Dilated convolutions can detect a pattern between words distant from each other (Convolutions only detect patterns between words close to each other). Finally, he told us they use an unsupervised method to get the product image similarity via embeddings distance as dissimilarity metric. Therefore, given a product, they could retrieve top-N similar products.

Spatial transformers are explained here

and dilated convolutions for NLP are explained here


Next, Miroslav Kobetski, the co-founder from Volumental, showed their technology to measure body using 3D scanning with CNN. To provide consumers with the best recommendations and engaging shopping assistance, they try to get the accurate data of consumer to make the analysis. This guy talked a new tech using CNN to calculate the distance between two images to get a similar picture to query. They can reduce the annotation effort needed to reach high accuracy on new types of visual data.

image (1).jpeg

Then a lady named Susana Zoghbi, postdoctoral researcher from KU Leuven showed their research about attribute abstract. People can perform a novel cross-modal search task in fashion, develop novel representations for cross-modal translations from noisy data and annotate image via their research. She started with their goal: translate images into text and vice versa. For example, it could abstract red from a picture which is a red hat. This is good for improving product recommendations with fine-grained attributes. They used bag of words and semantic word embeddings to do textual representations, scale-invariant feature transform and convolutional neural networks to do image representations, bilingual latent dirichlet allocation, canonical correlation analysis and neural network to do alignment models. Their results show that it is possible to design algorithms that automatically “translate” visual concepts into text and vice-versa.

image (2).jpeg

Finally, Amau Ramisa, senior computer vision researcher, from Wide Eyes Technologies, showed an interesting method to make search only by screening a picture via phone. Wide Eyes Technologies provides its technology to fashion companies for use in their applications. He starts with the concept what should be used to query: search by text is out of date, we should use pictures now. They use Siamese networks to calculate the similarity of two pictures and then offer the top-n similar picture (product). They could find almost every product in the picture and recommend similar products as long as they are in their dataset. The point in their technology is that they can basically rule out the noise and identify different products at the same time.

image (3)

There are eight speakers on second day (6.2)

First of all, Deepomatic showed their product which could help customers to build their own datasets and train it for themselves. It is a kind of service offered by them. People could use it play like a machine learning engineer without the knowledge of building a machine learning model. All the things users needs to do is build a dataset. Augustin Marty, Co-Founder & CEO of Deepomatic start with Siamese CNN, which is similar to Wide eyes on the first day. Then he introduced the performance of their technology and a demo. Their company could offer a tool building an AI for each company. Then these company could use this to analyze customer and offer better services.

image (4).jpeg

The second speaker is Jekaterina Novikova, from Heriot-Watt University. Her speech is about pepper, which is a quite famous robot developed by people from the UK, France and Japan. She talked about the application of pepper in Retail as a social robot to help customers. Evaluation is the first challenge when we try to know whether this robot does a good dialogue. Machine learning is a way to develop dialogue strategy between robot and customer and evaluating the sentence (compare the similarity of human answer and robot answer and give a mark for robot answer) generated by the robot. They used reinforcement learning to combine task-related and chat-related dialogue according to the human behavior since correct rewards are a crucial factor in dialogue policy training. Finally, she thought social robots are coming to retail industry and ML is used both for developing dialogue strategy and evaluating results.

image (5).jpeg


This guy started with the cache technology they are using to speed up the query speed, but which also resulted in a problem—user cannot get the latest price and they complained about this problem seriously. Then he talked about the strategy they are using to predict the binary change of price (increase or decrease) avoiding frequently asking their partner the prices of flight. The method they are using is random forest, and they picked more than ten features as the factor they are going to use. Then, he said they use augment data to solve no visibility for the Quote Age > TTL and develop a data-trace simulator of cache to eliminate the limitation of supervised evaluation. Finally, he shared Embedding as a way to encode location data with us. After the conference, we get in touch with him and talk some interesting things about the bought of Skyscanner by, and we are happy to hear there is basic no impact to them.

image (6).jpeg

The fourth speaker is Kostas Perifanos from Argos. He started with the basic idea of word embedding and some popular methods to do it. Then he introduced the neural probabilistic language model here. Then he introduced the training process and results. They use embeddings to find synonyms for the query terms. They transfer the query sentence into the vector and then use this to look for the synonymous improving the searching accuracy.

image (7).jpeg


The fifth speaker is Tom Charman, co-founder & CEO from KOMPAS. His speech focused on the importance of leveraging data, with the intention of training machines to learn about behavioral patterns, and make recommendations. We look at the accuracy of these recommendations, and how we can test the success of such machines and algorithms. His speech is about computer vision (object recognition and facial recognition), NLP (machine translation, sentiment analysis and chatbots) and machine learning (pattern analysis and clustering data). Finally, he indicated the application of AI in the retail field.


image (8).jpeg


The sixth speaker is Ekaterina Volkova-Volkmar from Codec. This lady showed us the necessity to find out what customer needs is series for all the companies because people nowadays always waste too much time on meaningless things like boring video on YouTube. Codec helps company to understand their customers and help customer to make a decision. She started with the challenge we (company and customer) are facing now. They make 5 questions (what are they interested in? who do they listen to? Who are these people? How do they interact? How does everything change across time?) into a tribe and do a good community according to this tribe. They could offer an analysis according to this tribe to companies. Then the company could make the recommendation more practical. Finally, she gave a good thought: big data is good, smart data is better.

image (9)

The seventh topic is the next generation consumer analytics given by Cathal Gurrin. This is a lecturer from Dublin University who showed how to use the data detected by wearable equipment to make an analysis for a consumer. He said this is a new era of personal sensing that allows us to understand people in previously unimaginable detail. First of all, he describes the three steps of consumer analytics (Professional & ExpensiveLower-Cost Data Driven Low cost – high volume – Extreme insights). Then he introduced the wearable equipment and the various data collected by these devices. Finally, he agreedd people will engage in this data collection and deep consumer analytics will become as simple as a google search.

image (10).jpeg

The last speech was given by Ofri Ben Porat, who is the CEO & Co-Founder from Pixoneye. Pixoneye can make a profile for a specific person according to the photo gallery in his/her phone. And they said they just extract the information about users and have no care about what the photo is, so it does not matter if the photo has noise. He started with lifeline then turn to the photo gallery. These two are similar. Then he introduced the image understanding and contextual understanding, which are two technologies applied by them now.

image (12).jpeg




Author: Junyi Li

1 comment on “RE·WORK: Deep Learning in Retail Summit (London, UK)

  1. Pingback: Synced Review | Customer Lifetime Value Prediction Using Embeddings

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: