Andrus Development: Updated Reinforcement Learning Tic Tac Toe Game

Saturday, March 11, 2006

Updated Reinforcement Learning Tic Tac Toe Game

I just put version 1.2 of my reinforcement learning tic tac toe game on my development website at: http://williamandrus.tripod.com/RLTTT.html

Increased learning speed and decrease xml file size by implementing rotation, and reflection of the states.

Fixed one minor bug in the Monte Carlo learning in recognizing a final state

Fixed one minor bug in the Temporal Difference learning in updating the state values

12 comments:

Antonio (italian)October 14, 2008 5:04 AM
Hi William Andrus..! I am a italian student. For my thesis I must implement a tic tac toe with reinforcement learning. So, your solution is perfect but I have need a source code (for example java). Can you help me please...? Sorry for my english.
ReplyDelete
Replies
William AndrusOctober 15, 2008 8:56 AM
Yeah, no problem.

I put my code at the following location:

http://cid-3ecbca9f307e27b3.skydrive.live.com/self.aspx/Personal%20Programs

Download the TicTacToe.zip folder.
ReplyDelete
Replies
Antonio (italian)October 16, 2008 12:16 AM
Good morning William...!! Thank you for your availability. This code is very important to me. Thank you very much. Good luck to all.
ReplyDelete
Replies
Antonio (italian)October 17, 2008 8:28 AM
Hi William I am Antonio (italian student). I have seen your projet and is very perfect in everything but I must convert it in Java (applet) code for my work. I have found a Net2Java for NetBeans that convert code from C# to Java but at the end of conversion arose more then 1.000 error. Do you have some ideas for solving this problem without having to rewrite everything from scratch....? Thanks.
PS: apologize for my insistence.
ReplyDelete
Replies
William AndrusOctober 17, 2008 12:37 PM
Well, there aren't a whole lot of tools that can convert C# to Java. Actually I think Net2Java is the only one. The best way to convert is by hand, since C# and Java don't differ too much.

Since, you are using Net2Java, maybe a majority of the conversion is done. Without knowing the errors, I can only assume that they can range from simple to complex.

The agent.cs class should be the most important one. This class reads/writes the results to an xml file, and computes the value of winning.

The other classes deal with the interface and graphics for the program.
ReplyDelete
Replies
Antonio (italian)November 24, 2008 1:18 PM
Hi William. Sorry if I annoying you often. Next friday I must deliver my thesis and I must finish the last chapter where I must explain the techniques and algorithms used in the code relatively to Reinforcement Learning. So I don't understand why for the methods "getMove" in MonteCarlo you consider for comparison only the first 4 states (instead all 8 the states). Is there a particular reason? You could kindly tell me what algorithms (for example DP - policy iteration or value iteration...) you used for the three methods? Still excuse and thank you very much for everything.
ReplyDelete
Replies
William AndrusNovember 24, 2008 2:13 PM
Its been a long time since I program this, but I believe one of the changes I made from an older version was to eliminate the number of states my program would have to look through. This was done by taking into account symmetry. The 4 state only check in Monte Carlo is a bug, and should look at all eight states. I must of forgot to include the other states, as I did in the other 2 learning styles.

I believe the evaluation I do per game is a value iteration. I do allow the user to change the policy variables (epsilon, step size, rewards), allowing for a more varied learning experience.

Also, turn off "Auto Explore New States" when you want to show off the program. I made this to force the agents to explore as many states as possible, but this is not good when you want to play a human.
ReplyDelete
Replies
Antonio (italian)November 24, 2008 2:31 PM
Sorry but I didn't understand very well... do you mean I have to do to control all eight states also for the Monte Carlo?
ReplyDelete
Replies
William AndrusNovember 24, 2008 2:38 PM
Yes, so it should be something like:
if (state1 == states[0, j] || state2 == states[0, j] || state3 == states[0, j] || state4 == states[0, j] || state5 == states[0, j] || state6 == states[0, j] || state7 == states[0, j] || state8 == states[0, j])
{
found = true;
//get the value
values[i] = double.Parse(this.states[1, j]);
if (values[i] > largestValue && board[i] == 0)
{
largestValue = values[i];
this.move = i;
}
}
ReplyDelete
Replies
Antonio (italian)November 24, 2008 2:58 PM
Perfect, I understand. Thank you. PS: In Italy it is midnight, I'm going to sleep, good continuation.
ReplyDelete
Replies
Antonio (italian)November 25, 2008 4:50 AM
I have another question... Based on what principle you have build the state2, state3, ....., state8? Should also built the following state?

string state9 = (state[0].ToString() + state[1].ToString() + state[2].ToString() + state[3].ToString() + state[4].ToString() + state[5].ToString() + state[6].ToString() + state[7].ToString() + state[8].ToString());

We feel sooon
ReplyDelete
Replies
William AndrusNovember 25, 2008 8:35 AM
Well, that would be state1 or just state in some cases.

I believe it first looks for a possible move (state1) then looks at the other possible variations for this state based on symmetry.

So in total it is looking at 8 possible variations that 1 move will be similar too. This is done to look up previous values, since I only hold 1 of the 8 moves.
ReplyDelete
Replies

Add comment