Industry Interview Technology

Synced Exclusive Interview with “Fine Art” Team Lead: “Stronger Models though Brand New Reinforcement Learning Techniques”

Synced had the opportunity for an exclusive interview with Tecent AI Lab’s senior manager, team lead for Fine Art, Liu Yongsheng.

On March 19, Tencent’s Fine Art defeated all its opponents and claimed victory at the AI exclusive Computer Go UEC Cup. After the competition, Tencent AI Lab announced they will unveil all of Fine Art’s technical details in an effort to propel Go AI development. Synced had the opportunity for an exclusive interview with Tencent AI Lab’s senior manager, team lead for Fine Art, Liu Yongsheng, during which he discussed all the secrets behind Fine Art.


 

On March 19, the 10th annual Computer Go EUC Cup concluded in Tokyo, Japan. This was the first time Tencent AI Lab’s Go AI -”Fine Art” competed on the world stage, and it took the competition by storm. After a 7-win streak on the March 18 propelling it into the Round of 16, it took home the EUC cup after another 4-win streak. For the Finals, Fine Art faced off against Japan’s DeepZenGo, and took 29 minutes (196 hands) to defeat it. Following the tradition of the UEC cup, on March 26, Fine Art will face Japanese professional Go player Ryō Ichiriki in an exhibition match.

The Computer Go UEC Cup is an annual world-wide computer Go tournament held at the University of Electro-Communications (UEC) in Tokyo, Japan since 2007. Every year, the world’s top Go AI teams convene in Tokyo to compete in the game of Go and exchange ideas. Past winners include Japan’s DeepZenGo (3 times) and France’s Crazy Stone (4 times). Facebook’s Dark Forest also made it to the finals in 2016, and was the runner-up for that year. This year’s competition invited 30 teams from across the world, and was especially unique with Tencent Fine Art’s attendence. The Japanese Go Association “Nihon Ki-in” even sent Wang Ming-wan, a 9 dan Go master to live announce the competition, showing its high regard for the computer go competition.

Accoridng to reports, Google DeepMind’s AlphaGo was invited, but turned down the invitation. However, this does not mean the DeepMind team has stopped developing AlphaGo. It is poised to face off against Chinese Go Master Ke Jie in April.

1

Figure 1: Results starting from the Round of 16

“We are very happy that Fine Art was able to win the UEC Cup, this was a very precious experience for us. Fine Art is different from other Lab AI’s, as it benefited from the tutelage of world class Go players. It grew step by step from playing against these players. We hope that we can attract more attention to the traditional game of Go through Fine Arts.”Tencent VP, Director of Tencent AI Lab Yao Xing said after the competition. “Fine Art’s value goes beyond Go itself. We have made tremendous progress and innovation in deep learning and reinforcement learning, and we will publish these innovations and dataset details in the form of academic papers to help propel the development of Go AI. At Tencent AI Lab, we wish for the proliferation of AI, hence we will take an open and collaborative approach, to work together with industry and improve the development of AI technologies.”

Prior to competing in the UEC Cup, Fine Art used several IDs to play against amateur and professional Go players on Tencent’s Go platform – Fox Go. Fine Arts defeated numerous Go experts from China, Japan and Korea, and became the first 10 dan Go player on the platform. Up until March 9, Fine Art has played 534 matches, with 406 wins and 128 losses at a 76% win rate. It has played against over 100 famous Go players including Ke Jie, Gu Li, Chang Hao, Fan Yunruo, Fan Tingyu, and Junghwan Park.

Similar to AlphaGo, the first AI to defeat a human Go master (and numerous times following), Fine Art’s primary method of training is through human Go match databases and by playing against itself. Its core algorithm is based on a Policy Net and a Value Net, with innovative methods that greatly improved the accuracy of its Value Net, giving it a much better strategic view of the game. Simply put, Policy refers to the decision for every move, to choose good moves and ignore bad moves, it is a microscopic evaluation. Value refers to understanding the overall game to evaluate the current chance of winning, it is a macroscopic evaluation.

A year ago today, Fine Art was only an idea. On January 28 of last year, Tencent AI Lab’s senior manager and expert engineering Liu Yongsheng received an instant message via the company internal messenger. It was from Tencent VP Yao Xing, and it read: are you confident working on a Go AI? If not, we can start by working on a Chess AI. At the time, Liu Yongsheng did not have a clear concept regarding Go AI, so he responded back saying he needs to do some research first. During his Chinese New Year holidays, Liu Yongsheng read up on Go and Go AI, and established a basic understanding of Go AI. After the holidays, on Feb 17, Yao Xing asked again whether the Go AI is a go, and Liu’s response was again “need more research”. But this time, he promised that a demo would be available within a month.

On March 4, the first demo was completed, with a skill level of about 5th kyu (intermediate amateur). By the end of March, Tencent AI Lab officially launched the Go AI project, at the time titled “WeiGo”, and a team was assembled. Fine Art’s skill level officially surpassed amateur 6 dan (advanced amateur) by the end of June, making this an important milestone in its development journey.

By August, Fine Art used its first online pseudonym to play on Fox Go (Tencent’s online Go patform), and defeated his first professional Go opponent on the 23rd. On September 4, under another pseudonym, Fine Art defeated an opponent named “tby” a consecutive 8 times. “Tby” was the account of Ko Reibun, son of professional Go player Nie Weiping and a professional Go player himself. It is through this continuous training that Fine Art honed its skills at Go.

On November 1, Fine Art officially launched on Fox Go using its real name, and defeated the world champion Jiang Weijie the very next day. On the 19th, Fine Art faced off against Ke Jie for the very first time, with 1 win and 1 loss. On the 28th, Fine Art faced off against Korea’s number one Go player, Junghwan Park, with 5 wins and 1 loss. After February 14 of this year, Fine Art’s win rate against national and world champions on Fox Go has stabilized around 90%.

2

Since August of 2016, Fine Art has been playing against human players on Tencent’s Fox Go platform. As the system improves, its performance improved as well. Starting this year, Fine Art defeated several professional 9 dan Go players from China, Japan, and Korea, and became Fox Go’s first 10 dan player on March 3.

How can Tencent’s AI improve to a level beyond world champions so quickly? As Fine Art’s developers, what are the research directions for Tencent AI Lab? After its victory at the UEC Cup, Synced interviewed Tencent AI Lab’s senior manager, team lead for Fine Art Liu Yongsheng, and he revealed to us the secrets behind Fine Art.

3

Regarding this Competition

Synced: Let’s dial back to before the competition; did you guys discuss the potential result? Were you guys confident on winning the competition?

Liu: The UEC cup is by nature a platform for exchanging innovation and technology, and it has the world’s top AI Go players. We went in with the mindset of learning from our peers. We feel very excited but also very lucky for our win.

Synced: Thinking back to the competition, what are some of the most memorable experiences? What are some of the technical challenges?

Liu: We were very nervous toward the middle of the (final) match, and we could feel DeepZenGo’s skill level has improved since the round robin. It was a very respectable opponent, and Fine Art’s performance was phenomenal.

Synced: Would you comment on the competition this year (especially DeepZenGo and CrazyStone)?

Liu: Over the last few years, they were the masters at AI Go, and they made great contributions to AI Go. In the past year, they all successfully incorporated neural nets into their systems and greatly improved their skill level, especially DeepZenGo, which has a great win rate against professional players. The professional players’ opinion on it is very high.

Over the course of the 2 day competition, we played DeepZenGo twice, both of which were very intense, with the first 100 hands equally matched. It was Fine Art’s superior mid and late game that allowed us to pull through.

Synced: I heard that Tencent AI Lab’s 13 people team spent a year developing Fine Art. What are the team members’ backgrounds? Are there any Go experts?

Liu: Yes, the Fine Art team has 13 people, half of which worked on developing the algorithm, half of which worked on applying the algorithm. Every member of the team belongs to Tencent AI Lab, which was founded in 2016 with a focus on the theoretical development and application of AI. Currently, we have over 50 world class AI scientists (90% are PhDs from top universities across the world), and over 200 experienced engineers.

In the Fine Art team, there are some who loves Go, and there are some who knows nothing about Go. But the director team has a few experts: the Director of AI Lab, Yao Xing, is ranked amateur 3 dan; the President of Techology and Engineering Group (TEG) which AI Lab is under, Lu Shan,is ranked amateur 5 dan. We even invited professional Go player Luo Xihe (professional 9 dan) to be Fine Art’s professional training partner. And this is without counting the numerous professional players on Fox Go. We aren’t exaggerating when we say Fine Art is a Go AI improved from playing against professional players.

The Technology behind Fine Art

Synced: We know in reinforcement learning, the optimal policy and optimal cost function are both global optimums, and not local optimums. Optimal policy decides how to play the next move to ensure maximum win rate in the future. How can one interpret the “Microscopic” and “Macroscopic” reported by Tencent news on Fine Art?

Liu: Simply put, Policy refers to the decision for every move, to choose good moves and ignore bad moves. It is a microscopic evaluation, or a move by move decision. Value refers to understanding the overall game to evaluate the current chance of winning. It is a macroscopic evaluation, or a strategic decision.

Synced: Monte Carlo Search Tree is a critical technique in AlphaGo, was it also used in Fine Art?

Liu: Yes.

Synced: Another report by Tencent news mentioned that “During Fine Art’s development, playing against human players was a critical factor in it becoming a powerful Go player. Fine Art’s breakthrough always came after it defeating a player of a certain skill level.” We know in AlphaGo, past human matches helped it learn the Rollout Policy and SL Policy; with the latter being the initial value for the RL Policy. Setting the initial value can help speed up learning better policies, but the initial value does not affect the quality of the final learned policy. Please explain how Fine Art’s improvement “benefitted from playing against world class Go players”.

Liu: During the development of Fine Art, it was very difficult to evaluate its skill level and potential problems. As its skills improved and regular players can’t defeat it anymore, this became even more difficult. Therefore, the development of Fine Art greatly benefitted from the top Go players from around the world is correct. They were a huge help in speeding up the development.

Synced: AlphaGo’s training process can be seen as solving an optimization problem, while being 100% autonomous with no human intervention. Tencent’s reporting on Fine Art kept emphasizing the importance of top Go players. How can top Go players integrate into the optimization problem? Or were there human-designed rules in Fine Art?

Liu: Human intervention is a thing of the past. Fine art is an end to end decision process. Where the world’s top Go player comes in is by helping us analyze the AI’s matches to find problems, analyze the reason for these problems, and come up with solutions to these problems.

Synced: What is the design strategy behind the Fine Art system? Compared to AlphaGo (which also used Policy Net and Value Net), what are some of Fine Art’s innovations?

Liu: Fine Art was trained from human Go match databases and playing against itself. Its algorithm is based on Policy Nets and Value Nets, along with some innovative methods that increased the accuracy of its Value Net, which is why it has such a great strategic view.

Fine Art’s technical and database details will be released in the future, in the form of academic papers. We hope by taking an open and collaborative stand, we can help and inspire more researchers to advance the development of Go AI.

Fine Art is backed by Deep Learning and Reinforcement Learning, the two hottest research areas in Machine Learning right now. Its overall architecture more or less follows that of AlphaGo, published on Nature in January 2016. Although it is still a pure machine learning system, we went beyond the paper in our application, and made our own innovations.

For example, modern reinforcement learning uses advanced machine learning algorithms as a simulator to generate high quality, effective experience replays – the process of playing against itself. Through this process, the learning model continuously generates reinforced data to help itself improve.

While developing Fine Art’s learning model, we discovered some new and highly effective reinforcement learning methods that can generate higher quality simulated data, which lead to a more powerful model. For example, compared to many other Go AI’s, Fine Art is much better at semeai (Go terminology for “capturing race”). And in the process of developing Fine Art, Tencent AI Lab accumulated numerous effective techniques to generate high quality reinforcement learning data. These techniques can be used in a wide range of applications.

As for the hardware system that everyone wants to know about, Fine Art has a centralized version and a distributed version. We tested the centralized version, and it’s not that far off from the distributed version. The distributed version also used fewer resources than what DeepMind published, thus we can say that Fine Art did not win due to its resource power.

Also, during training, Fine Art used high quality data generated by Tencent’s cloud compute platform. These compute resources are openly accessible for anyone from Tencent Cloud.

Future Research Area

Synced: In what real world applications can Tencent’s AI Go research (in other words reinforcement learning technology) be used in? Can you please list some examples?

Liu: From an application point of view, in the short term, Tencent Go is one of China’s largest and most active online Go platforms. If we do (Go AI) well, it can be immediately used by many people. In the mid-term, AI Lab is focusing on 4 areas of application: Content AI, Social AI, Game AI, and Cloud AI. Go AI goes hand in hand with Game AI, and it has quite the potential for Tencent. In the long term, the “precision policy” ability behind Fine Arts can also be applied to autonomous vehicles, quantifying finance, or assisting the healthcare industry. If we can evolve from Go AI’s symmetrical game system into an asymmetrical game system, which can be used in dealing with the uncertainty in everyday life, this would unlock a world only bound by our imagination. Of course, this is a very long term application.

When I imagine what the future holds, I think the future of AI is not one mature product, but rather AI will be entrenched in peoples’ lives. Everyone will be helped by AI in some way.

Synced: Go has been long considered the Holy Grail of symmetrical games. After AlphaGo’s victory last year, people’s attention has gradually shifted towards asymmetrical games. There has been quite a leap in poker – AI has already claimed victory over humans in one on one unlimited competition. Then there is DeepMind which has started looking at StarCraft 2. Does Tencent AI Lab have any current projects on asymmetrical games? Please give us some information of your progress.

Liu: We are indeed working on some interesting research in the Game AI domain, but specific details will be announced at a later date.

Synced: Outside of this, what other areas is AI Lab working on?

Liu: Ai Lab’s research is mainly based on four verticals: Computer Vision, Speech Recognition, Natural Language Processing (NLP), and Machine Learning. These four verticals basically cover the forefront of modern AI research, with every vertical representing a fundamental research direction with a lot of depth.

For example, in computer vision, other than the traditional image processing, there is also Augmented Reality (AR) related research and Simultaneous Localization and Mapping technologies. In speech recognition, outside of the traditional speech recognition and composition, there is also Automatic Translation. In NLP, other than the traditional human behavior research, there is also research on Chatbots. In machine learning, we range from supervised machine learning, to unsupervised machine learning, to reinforcement learning.

AI Lab also has four specific research areas based on Tencent’s lines of business: Content AI, Social AI, Game AI, and Cloud AI.

Content AI is based on content recommendation and search applications. For Social AI, social is programmed into Tencent’s DNA, with QQ, QSpace and Wechat all being social platforms, it only make sense to create an AI based on social: conversations, chatbots, virtual assistants etc. What make us stand out from other companies is Game AI. Game is a huge market for Tencent, and there is unlimited potential for incorporating AI into games. Is it possible to see AI compete in international League of Legend tournaments in the future and raise the playability of the game? Then there’s Cloud AI, where we wish to proliferate AI capabilities, like facial recognition based on computer vision, speech recognition, public opinion analysis process based on NLP, and deep learning clouds etc.

Synced: Some people say that AI has raised Go to a level never seen before, or it has unlocked brand new ideas. Tencent’s Fox Go platform even created a 10 dan rank for this, with Fine Art being the first one to claim it. Do you think developments in AI is going to bring forth new ideas for human traditions?

Liu: Speaking specifically on Go, Fine Art’s strategic view and certain moves might give human players a lot to think about.

We hope Fine Art can represent a certain responsibility for technologies – through Go AI’s interaction with human players to bring more attention to the traditional game and culture of Go. Tencent AI Lab’s goal is to “Make AI Everywhere”, to “Make AI ubiquitous in the future”. Such that technology and “provide for the human”, and better our lives.

Synced: Fine Art will face off against the 7 den Ryō Ichiriki for the exhibition match on March 26, what do you think are Fine Art’s chances?

Liu: We are confident, but our goal is to learn and exchange.
But the value of Go is multidimensional. Other than the simple win and lose, there is culture, art, and entertainment. If Fine Art wins the game, it’s not about AI defeating human, nor science defeating Go, there is no winner and loser here, it’s a win-win for everyone.

 


Original Article from Synced China http://www.jiqizhixin.com/article/2520 | Author: Zenan Li, Wu Pan| Localized by Synced Global Team: Xiang Chen

0 comments on “Synced Exclusive Interview with “Fine Art” Team Lead: “Stronger Models though Brand New Reinforcement Learning Techniques”

Leave a Reply

Your email address will not be published.

%d bloggers like this: