Andrus Development: Updated Reinforcement Learning Tic Tac Toe Game

Saturday, March 11, 2006

Updated Reinforcement Learning Tic Tac Toe Game

I just put version 1.2 of my reinforcement learning tic tac toe game on my development website at: http://williamandrus.tripod.com/RLTTT.html

Increased learning speed and decrease xml file size by implementing rotation, and reflection of the states.

Fixed one minor bug in the Monte Carlo learning in recognizing a final state

Fixed one minor bug in the Temporal Difference learning in updating the state values

12 comments:

Antonio (italian) said...: Hi William Andrus..! I am a italian student. For my thesis I must implement a tic tac toe with reinforcement learning. So, your solution is perfect but I have need a source code (for example java). Can you help me please...? Sorry for my english.; October 14, 2008 5:04 AM
William Andrus said...: Yeah, no problem.

I put my code at the following location:

http://cid-3ecbca9f307e27b3.skydrive.live.com/self.aspx/Personal%20Programs

Download the TicTacToe.zip folder.; October 15, 2008 8:56 AM
Antonio (italian) said...: Good morning William...!! Thank you for your availability. This code is very important to me. Thank you very much. Good luck to all.; October 16, 2008 12:16 AM
Antonio (italian) said...: Hi William I am Antonio (italian student). I have seen your projet and is very perfect in everything but I must convert it in Java (applet) code for my work. I have found a Net2Java for NetBeans that convert code from C# to Java but at the end of conversion arose more then 1.000 error. Do you have some ideas for solving this problem without having to rewrite everything from scratch....? Thanks.
PS: apologize for my insistence.; October 17, 2008 8:28 AM
William Andrus said...: Well, there aren't a whole lot of tools that can convert C# to Java. Actually I think Net2Java is the only one. The best way to convert is by hand, since C# and Java don't differ too much.

Since, you are using Net2Java, maybe a majority of the conversion is done. Without knowing the errors, I can only assume that they can range from simple to complex.

The agent.cs class should be the most important one. This class reads/writes the results to an xml file, and computes the value of winning.

The other classes deal with the interface and graphics for the program.; October 17, 2008 12:37 PM
Antonio (italian) said...: Hi William. Sorry if I annoying you often. Next friday I must deliver my thesis and I must finish the last chapter where I must explain the techniques and algorithms used in the code relatively to Reinforcement Learning. So I don't understand why for the methods "getMove" in MonteCarlo you consider for comparison only the first 4 states (instead all 8 the states). Is there a particular reason? You could kindly tell me what algorithms (for example DP - policy iteration or value iteration...) you used for the three methods? Still excuse and thank you very much for everything.; November 24, 2008 1:18 PM
William Andrus said...: Its been a long time since I program this, but I believe one of the changes I made from an older version was to eliminate the number of states my program would have to look through. This was done by taking into account symmetry. The 4 state only check in Monte Carlo is a bug, and should look at all eight states. I must of forgot to include the other states, as I did in the other 2 learning styles.

I believe the evaluation I do per game is a value iteration. I do allow the user to change the policy variables (epsilon, step size, rewards), allowing for a more varied learning experience.

Also, turn off "Auto Explore New States" when you want to show off the program. I made this to force the agents to explore as many states as possible, but this is not good when you want to play a human.; November 24, 2008 2:13 PM
Antonio (italian) said...: Sorry but I didn't understand very well... do you mean I have to do to control all eight states also for the Monte Carlo?; November 24, 2008 2:31 PM
William Andrus said...: Yes, so it should be something like:
if (state1 == states[0, j] || state2 == states[0, j] || state3 == states[0, j] || state4 == states[0, j] || state5 == states[0, j] || state6 == states[0, j] || state7 == states[0, j] || state8 == states[0, j])
{
found = true;
//get the value
values[i] = double.Parse(this.states[1, j]);
if (values[i] > largestValue && board[i] == 0)
{
largestValue = values[i];
this.move = i;
}
}; November 24, 2008 2:38 PM
Antonio (italian) said...: Perfect, I understand. Thank you. PS: In Italy it is midnight, I'm going to sleep, good continuation.; November 24, 2008 2:58 PM
Antonio (italian) said...: I have another question... Based on what principle you have build the state2, state3, ....., state8? Should also built the following state?

string state9 = (state[0].ToString() + state[1].ToString() + state[2].ToString() + state[3].ToString() + state[4].ToString() + state[5].ToString() + state[6].ToString() + state[7].ToString() + state[8].ToString());

We feel sooon; November 25, 2008 4:50 AM
William Andrus said...: Well, that would be state1 or just state in some cases.

I believe it first looks for a possible move (state1) then looks at the other possible variations for this state based on symmetry.

So in total it is looking at 8 possible variations that 1 move will be similar too. This is done to look up previous values, since I only hold 1 of the 8 moves.; November 25, 2008 8:35 AM

Andrus Development

About Me

Saturday, March 11, 2006

Updated Reinforcement Learning Tic Tac Toe Game

12 comments:

Total Pageviews