Problem 1: What is the reward scheme for FrozenLake?
The states are: {S, F, H, G} representing starting point, frozen surface, hole, and goal, giving the reward scheme:
R(s,a) = 1 if s = G
           = 0 otherwise
Problem 2: How big is the table used for Q-learning above?
To learn the long-term expected rewards, we use Q-learning, which learns the value for taking a particular action within a given state. Since the table is storing the values of every action made from every state, ane we have 4 actions (up, down, left, right) and 4 states (starting point, frozen surface, hole, and goal), we get a transition function or table for each action. Each transition function table is 16 values (4 by 4), and so we get a total of 16*4 = 64 values in your table.
Problem 3: Did the rewards improve over time? Why or why not?
The rewards did not improve over time as much as it performs well really quickly. This may be because the Q learning table is really small.
Problem 4: In the above code segment, we provided some values for the learning rate and discount factor. Try different values for the learning rate. What do you observe?
For instance, decreasing the discount factor decreases the amount of time it takes for the program to finish and coverge to find a final solution. For small learning rate values, the model does not perform as well.