The previous examples show the value of Q-learning in obtaining an optimal policy. While these trivial examples were contrived to clearly show the difference between an optimal and greedy policy, we show that reinforcement learning can be applied to large arbitrarily complex P2P networks.
We read in a physical network topology previously generated with BRITE. It then randomly assigns each node a unique key and builds each node's routing table.
The simulation proceeds in a series of lookup iterations. On each iteration the simulator picks a random node to lookup a random key unless temporal key lookups are enabled. For temporal key lookups random nodes lookup a key from a set of keys K. Each key of K is looked up sequentially in a round-robin fashion.
We use the BRITE topology generator to build large physical topology trivially representative of a real wide-area network. All of the simulations use a topology consisting of 100 autonomous systems each with 100 routers for a total of 10,000 nodes. In each simulation, we run through 50,000 iterations of networks with 10,000 nodes randomly distributed through a 24 bit key space. We select a 24 bit key space to allow for a sufficiently large network that is easily representable. A larger key space in conjunction with a relatively spare node population would imply larger routing tables in the Chord protocol and would not emphasize the learning in our model.
CONTINUED...