Reinforcement Learning example for tree search

I have been learning Reinforcement Learning for about two weeks. Although haven’t go through all the course of Arthur Juliani, I had been able to write a small example of Q-learning now.
This example is about using DNN for Q-value table to solve a path-finding-problem. Actually, the path is more looks like a tree:

The start point is ‘0’, and the destination (or ‘goal’) is ’12’.

The code framework of my example is mainly from Manuel Amunategui’s tutorial but replacing Q-value table with a one-layer-neural-network.

The rewards curve in training steps:

And this example will finally report:

which is the correct answer.