We next turn to the more interesting problem of optimizing lookup in the
overlay with respect to the underlying physical network latencies. The
latencies between pairs of nodes in the overlay are invisible to the
overlay and must thus be explored directly. Again, we assume that nodes in
the P2P overlay are uniformly distributed with no knowledge of the
physical network. Thus, a lookup might start from a node in North America,
forward the request to Japan only to have the request forwarded back to
the United States. To provide an intuitive example, consider the same P2P
overlay configuration as in the previous section (Figure 1).
Next, consider an underlying connected IP topology graph as in Figure
8. Each node with
in the overlay keyspace is connected
to the router (vertex) in the topology graph with
label
. Each edge of the topology graph is labeled with the latency
between the two nodes.
From this topology, we see that the greedy approach leads to a latency of 22. We apply the same Q-learning algorithm as before but use latency as the cost reward function. As before we continually ask node 0 to lookup a key belonging to node 7. Figure 9 shows node 0's Q-learning table for destination key 7 as a function of iteration. We see that node 0 converges on selecting node 2 as the optimal next hop when trying to reach key 7 in order to minimize latency.
Figure 10 shows the latency of each lookup versus iteration. Again, after an initial period of exploration, the algorithm settles down to an optimal policy. Figure 11 shows the latency improvement of the learned routing policy versus the greedy policy as a function of lookup iteration.