About Me

My photo
Northglenn, Colorado, United States
I'm primarily a BI Developer on the Microsoft stack. I do sometimes touch upon other Microsoft stacks ( web development, application development, and sql server development).

Saturday, March 11, 2006

Updated Reinforcement Learning Tic Tac Toe Game

I just put version 1.2 of my reinforcement learning tic tac toe game on my development website at: http://williamandrus.tripod.com/RLTTT.html


  • Increased learning speed and decrease xml file size by implementing rotation, and reflection of the states.

  • Fixed one minor bug in the Monte Carlo learning in recognizing a final state

  • Fixed one minor bug in the Temporal Difference learning in updating the state values

12 comments:

Antonio (italian) said...

Hi William Andrus..! I am a italian student. For my thesis I must implement a tic tac toe with reinforcement learning. So, your solution is perfect but I have need a source code (for example java). Can you help me please...? Sorry for my english.

William Andrus said...

Yeah, no problem.

I put my code at the following location:

http://cid-3ecbca9f307e27b3.skydrive.live.com/self.aspx/Personal%20Programs

Download the TicTacToe.zip folder.

Antonio (italian) said...

Good morning William...!! Thank you for your availability. This code is very important to me. Thank you very much. Good luck to all.

Antonio (italian) said...

Hi William I am Antonio (italian student). I have seen your projet and is very perfect in everything but I must convert it in Java (applet) code for my work. I have found a Net2Java for NetBeans that convert code from C# to Java but at the end of conversion arose more then 1.000 error. Do you have some ideas for solving this problem without having to rewrite everything from scratch....? Thanks.
PS: apologize for my insistence.

William Andrus said...

Well, there aren't a whole lot of tools that can convert C# to Java. Actually I think Net2Java is the only one. The best way to convert is by hand, since C# and Java don't differ too much.

Since, you are using Net2Java, maybe a majority of the conversion is done. Without knowing the errors, I can only assume that they can range from simple to complex.

The agent.cs class should be the most important one. This class reads/writes the results to an xml file, and computes the value of winning.

The other classes deal with the interface and graphics for the program.

Antonio (italian) said...

Hi William. Sorry if I annoying you often. Next friday I must deliver my thesis and I must finish the last chapter where I must explain the techniques and algorithms used in the code relatively to Reinforcement Learning. So I don't understand why for the methods "getMove" in MonteCarlo you consider for comparison only the first 4 states (instead all 8 the states). Is there a particular reason? You could kindly tell me what algorithms (for example DP - policy iteration or value iteration...) you used for the three methods? Still excuse and thank you very much for everything.

William Andrus said...

Its been a long time since I program this, but I believe one of the changes I made from an older version was to eliminate the number of states my program would have to look through. This was done by taking into account symmetry. The 4 state only check in Monte Carlo is a bug, and should look at all eight states. I must of forgot to include the other states, as I did in the other 2 learning styles.

I believe the evaluation I do per game is a value iteration. I do allow the user to change the policy variables (epsilon, step size, rewards), allowing for a more varied learning experience.

Also, turn off "Auto Explore New States" when you want to show off the program. I made this to force the agents to explore as many states as possible, but this is not good when you want to play a human.

Antonio (italian) said...

Sorry but I didn't understand very well... do you mean I have to do to control all eight states also for the Monte Carlo?

William Andrus said...

Yes, so it should be something like:
if (state1 == states[0, j] || state2 == states[0, j] || state3 == states[0, j] || state4 == states[0, j] || state5 == states[0, j] || state6 == states[0, j] || state7 == states[0, j] || state8 == states[0, j])
{
found = true;
//get the value
values[i] = double.Parse(this.states[1, j]);
if (values[i] > largestValue && board[i] == 0)
{
largestValue = values[i];
this.move = i;
}
}

Antonio (italian) said...

Perfect, I understand. Thank you. PS: In Italy it is midnight, I'm going to sleep, good continuation.

Antonio (italian) said...

I have another question... Based on what principle you have build the state2, state3, ....., state8? Should also built the following state?

string state9 = (state[0].ToString() + state[1].ToString() + state[2].ToString() + state[3].ToString() + state[4].ToString() + state[5].ToString() + state[6].ToString() + state[7].ToString() + state[8].ToString());

We feel sooon

William Andrus said...

Well, that would be state1 or just state in some cases.

I believe it first looks for a possible move (state1) then looks at the other possible variations for this state based on symmetry.

So in total it is looking at 8 possible variations that 1 move will be similar too. This is done to look up previous values, since I only hold 1 of the 8 moves.