Alpha go and the world of Artificial Intelligence

Sanyam Bhutani
Sanyam Bhutani


What is AlphaGo?

AlphaGo is a narrow AI, a computer program developed by Google DeepMind to play Go, a Chinese strategy board game for two players similar to Chess. AlphaGo is the very first AI program that was able to beat a professional human player on a full-sized board with no handicaps.

How does Alpha go training work?

The AlphaGo project started off as a test-bed in order to espy how well Google DeepMind's neural network algorithm utilizing deep learning could compete at Go. The algorithm is a combination of tree search and machine learning techniques and reinforced with extensive training with both humans and other computer players. It uses the Monte Carlo tree search and is piloted by a policy and value network, implemented using deep neural network technologies. The policy network is trained and helps the AI predict the next move most likely to win while the value network is trained to narrow down the search tree and foreordain the value of those positions, estimating the winners in each position rather than searching all the way down to the end of the game. With an initial fueling of historical match moves from human players, utilizing a database of around 3 crore moves, it made it to mimic human plays. Once the AI reached a degree of proficiency, it was schooled further by making it to play against instances of itself, using reinforcement learning to improve and learn more.

Why is AlphaGo game changer?

There is a standard way to program an AI to play a board game known as the Search Tree in which the computer analyzes and determines which move during its turn is most likely to result in victory. The only thing Deep Blue can do is play chess and because chess is a finite game, Deep Blue never needs to get smarter. The difference in playing Go was that DeepMind decided to try something new in the world of games vs AIs: Machine learning and neural networks instead of custom-built search trees. It became the first computer program to defeat a professional go, player, 2-dan player Fan Hui (Oct’15), It then beat one of the highest ranked human players in the world, 9-dan Lee Sedol (Mar’16), winning four games out of five. Thus, AlphaGo became a game changer.

How can one train their own games?

Reinforcement Learning (RL) is a type of Machine Learning that allows machines and software agents to automatically determine the ideal behavior within a specific context and to learn it based on feedback from the environment, in order to maximize its performance. Simple reward feedback is required for the agent to learn its behavior; this is known as the reinforcement signal. In RL, a policy p is used to control what action should be taken, and a value function v to measure how good to be in a particular state. In Go, it controls the moves (actions) to win a game. To model uncertainty, the policy is a probability distribution p(s, a): the chance of taking a move a from the board position s. The value function is the likeliness that we will win from a specific board position.

Learn Reinforcement Learning skills with Udacity's Deep Reinforcement Learning Nanodegree Program.

Read more about Alpha go here

About the author
Sanyam Bhutani
Sanyam Bhutani

Sanyam Bhutani is a Deep Learning, Computer Vision Practitioner. He has worked on End to End AI based Industrial and Research Projects at Tech Mahindra, ONGC, IIT-Madras, IIT-Roorkee.