- Increased learning speed and decrease xml file size by implementing rotation, and reflection of the states.
- Fixed one minor bug in the Monte Carlo learning in recognizing a final state
- Fixed one minor bug in the Temporal Difference learning in updating the state values
Interesting programming ideas, solutions, and logic that I have used to solve problems or have come across throughout my career.
About Me
- William Andrus
- Northglenn, Colorado, United States
- I'm primarily a BI Developer on the Microsoft stack. I do sometimes touch upon other Microsoft stacks ( web development, application development, and sql server development).
Saturday, March 11, 2006
Updated Reinforcement Learning Tic Tac Toe Game
I just put version 1.2 of my reinforcement learning tic tac toe game on my development website at: http://williamandrus.tripod.com/RLTTT.html
Subscribe to:
Post Comments (Atom)
12 comments:
Hi William Andrus..! I am a italian student. For my thesis I must implement a tic tac toe with reinforcement learning. So, your solution is perfect but I have need a source code (for example java). Can you help me please...? Sorry for my english.
Yeah, no problem.
I put my code at the following location:
http://cid-3ecbca9f307e27b3.skydrive.live.com/self.aspx/Personal%20Programs
Download the TicTacToe.zip folder.
Good morning William...!! Thank you for your availability. This code is very important to me. Thank you very much. Good luck to all.
Hi William I am Antonio (italian student). I have seen your projet and is very perfect in everything but I must convert it in Java (applet) code for my work. I have found a Net2Java for NetBeans that convert code from C# to Java but at the end of conversion arose more then 1.000 error. Do you have some ideas for solving this problem without having to rewrite everything from scratch....? Thanks.
PS: apologize for my insistence.
Well, there aren't a whole lot of tools that can convert C# to Java. Actually I think Net2Java is the only one. The best way to convert is by hand, since C# and Java don't differ too much.
Since, you are using Net2Java, maybe a majority of the conversion is done. Without knowing the errors, I can only assume that they can range from simple to complex.
The agent.cs class should be the most important one. This class reads/writes the results to an xml file, and computes the value of winning.
The other classes deal with the interface and graphics for the program.
Hi William. Sorry if I annoying you often. Next friday I must deliver my thesis and I must finish the last chapter where I must explain the techniques and algorithms used in the code relatively to Reinforcement Learning. So I don't understand why for the methods "getMove" in MonteCarlo you consider for comparison only the first 4 states (instead all 8 the states). Is there a particular reason? You could kindly tell me what algorithms (for example DP - policy iteration or value iteration...) you used for the three methods? Still excuse and thank you very much for everything.
Its been a long time since I program this, but I believe one of the changes I made from an older version was to eliminate the number of states my program would have to look through. This was done by taking into account symmetry. The 4 state only check in Monte Carlo is a bug, and should look at all eight states. I must of forgot to include the other states, as I did in the other 2 learning styles.
I believe the evaluation I do per game is a value iteration. I do allow the user to change the policy variables (epsilon, step size, rewards), allowing for a more varied learning experience.
Also, turn off "Auto Explore New States" when you want to show off the program. I made this to force the agents to explore as many states as possible, but this is not good when you want to play a human.
Sorry but I didn't understand very well... do you mean I have to do to control all eight states also for the Monte Carlo?
Yes, so it should be something like:
if (state1 == states[0, j] || state2 == states[0, j] || state3 == states[0, j] || state4 == states[0, j] || state5 == states[0, j] || state6 == states[0, j] || state7 == states[0, j] || state8 == states[0, j])
{
found = true;
//get the value
values[i] = double.Parse(this.states[1, j]);
if (values[i] > largestValue && board[i] == 0)
{
largestValue = values[i];
this.move = i;
}
}
Perfect, I understand. Thank you. PS: In Italy it is midnight, I'm going to sleep, good continuation.
I have another question... Based on what principle you have build the state2, state3, ....., state8? Should also built the following state?
string state9 = (state[0].ToString() + state[1].ToString() + state[2].ToString() + state[3].ToString() + state[4].ToString() + state[5].ToString() + state[6].ToString() + state[7].ToString() + state[8].ToString());
We feel sooon
Well, that would be state1 or just state in some cases.
I believe it first looks for a possible move (state1) then looks at the other possible variations for this state based on symmetry.
So in total it is looking at 8 possible variations that 1 move will be similar too. This is done to look up previous values, since I only hold 1 of the 8 moves.
Post a Comment