- Increased learning speed and decrease xml file size by implementing rotation, and reflection of the states.
- Fixed one minor bug in the Monte Carlo learning in recognizing a final state
- Fixed one minor bug in the Temporal Difference learning in updating the state values
Hi William Andrus..! I am a italian student. For my thesis I must implement a tic tac toe with reinforcement learning. So, your solution is perfect but I have need a source code (for example java). Can you help me please...? Sorry for my english.
ReplyDeleteYeah, no problem.
ReplyDeleteI put my code at the following location:
http://cid-3ecbca9f307e27b3.skydrive.live.com/self.aspx/Personal%20Programs
Download the TicTacToe.zip folder.
Good morning William...!! Thank you for your availability. This code is very important to me. Thank you very much. Good luck to all.
ReplyDeleteHi William I am Antonio (italian student). I have seen your projet and is very perfect in everything but I must convert it in Java (applet) code for my work. I have found a Net2Java for NetBeans that convert code from C# to Java but at the end of conversion arose more then 1.000 error. Do you have some ideas for solving this problem without having to rewrite everything from scratch....? Thanks.
ReplyDeletePS: apologize for my insistence.
Well, there aren't a whole lot of tools that can convert C# to Java. Actually I think Net2Java is the only one. The best way to convert is by hand, since C# and Java don't differ too much.
ReplyDeleteSince, you are using Net2Java, maybe a majority of the conversion is done. Without knowing the errors, I can only assume that they can range from simple to complex.
The agent.cs class should be the most important one. This class reads/writes the results to an xml file, and computes the value of winning.
The other classes deal with the interface and graphics for the program.
Hi William. Sorry if I annoying you often. Next friday I must deliver my thesis and I must finish the last chapter where I must explain the techniques and algorithms used in the code relatively to Reinforcement Learning. So I don't understand why for the methods "getMove" in MonteCarlo you consider for comparison only the first 4 states (instead all 8 the states). Is there a particular reason? You could kindly tell me what algorithms (for example DP - policy iteration or value iteration...) you used for the three methods? Still excuse and thank you very much for everything.
ReplyDeleteIts been a long time since I program this, but I believe one of the changes I made from an older version was to eliminate the number of states my program would have to look through. This was done by taking into account symmetry. The 4 state only check in Monte Carlo is a bug, and should look at all eight states. I must of forgot to include the other states, as I did in the other 2 learning styles.
ReplyDeleteI believe the evaluation I do per game is a value iteration. I do allow the user to change the policy variables (epsilon, step size, rewards), allowing for a more varied learning experience.
Also, turn off "Auto Explore New States" when you want to show off the program. I made this to force the agents to explore as many states as possible, but this is not good when you want to play a human.
Sorry but I didn't understand very well... do you mean I have to do to control all eight states also for the Monte Carlo?
ReplyDeleteYes, so it should be something like:
ReplyDeleteif (state1 == states[0, j] || state2 == states[0, j] || state3 == states[0, j] || state4 == states[0, j] || state5 == states[0, j] || state6 == states[0, j] || state7 == states[0, j] || state8 == states[0, j])
{
found = true;
//get the value
values[i] = double.Parse(this.states[1, j]);
if (values[i] > largestValue && board[i] == 0)
{
largestValue = values[i];
this.move = i;
}
}
Perfect, I understand. Thank you. PS: In Italy it is midnight, I'm going to sleep, good continuation.
ReplyDeleteI have another question... Based on what principle you have build the state2, state3, ....., state8? Should also built the following state?
ReplyDeletestring state9 = (state[0].ToString() + state[1].ToString() + state[2].ToString() + state[3].ToString() + state[4].ToString() + state[5].ToString() + state[6].ToString() + state[7].ToString() + state[8].ToString());
We feel sooon
Well, that would be state1 or just state in some cases.
ReplyDeleteI believe it first looks for a possible move (state1) then looks at the other possible variations for this state based on symmetry.
So in total it is looking at 8 possible variations that 1 move will be similar too. This is done to look up previous values, since I only hold 1 of the 8 moves.