Thursday, January 20, 2022

Random walks on graphs

In order to better understand the random walk based algorithm of [cite 1] for $k$-gossip, I will be exploring the concept of random walks on both static and dynamic graphs.

Section 1: Markov Chains

A Markov chain can be thought of as a directed graph where edge $(u,v)$ has a probability label $P_{uv}$ denoting the probability that we move from node/state $u$ to node/state $v$. We can encode this directed graph in an adjacency matrix $P$. Where $P_{ij}$ is the probability label of the edge $(i,j)$ in the graph.

Equivalently, we can define Markov chains to be a set of random variables $\{X_t\}_{t\geq 0}$, where $X_t$ is the state of the Markov chain at time $t$. Then we can say that $P_{ij}=\mathbb{P}[X_t=j | X_{t-1}=i]$.

The defining characteristic of a Markov chain is that it is "memoryless", i.e. the value of our chain at time $t$ can be entirely dependent on the value of the chain at time $t-1$. So the history of state transitions before time $t-1$ is already captured by $X_{t-1}$. This is formally written as
$$\mathbb{P}[X_t=a_t | X_{t-1}=a_{t-1}, X_{t-2}=a_{t-2}, ..., X_0=a_0] = \mathbb{P}[X_t=a_t | X_{t-1}=a_{t-1}]$$
To get any one step transition probability ($P_{ij}$) we can just read from the matrix $P$, but what about a larger step transition probability? We want to compute $\mathbb{P}[X_t=j | X_0=i]$. We can write this as
$$\mathbb{P}[X_t=j | X_0=i] = \sum_{k}{\mathbb{P}[X_1=k | X_0=i] \cdot \mathbb{P}[X_t=j | X_1=k]} = \sum_{k}{P_{ik}\cdot \mathbb{P}[X_t=j | X_1=k]}$$
Therefore, we can inductively prove that $$\mathbb{P}[X_t=j | X_0=i] = P_{ij}^{t}$$
Now suppose that the probabilities of being in each state at time $t$ are given as a row vector $p(t)$. We can extend our result above to show that 
$$p(t)=p(t-1)\cdot P=p(0)\cdot P^t$$
It is natural to ask if $p(t)$ converges to some invariant/stationary distribution, which indicates that the Markov chain reaches some sort of equilibrium. Indeed we can compute this invariant distribution $\pi$ using the following
$$\pi=\pi P$$
The following theorem gives the necessary conditions for a Markov chain on $n$ states to converge to a unique invariant distribution.

Thm 1: Any finite, strongly connected and aperiodic Markov chain has the following properties:
  1. the chain has a unique invariant distribution $\pi=(\pi_0,\pi_1, ..., \pi_n)$
  2. for all $i$ and $j$, in the limit $\lim_{t\to\inf}{P^{t}_{j,i}}$ exists and it is independent of $j$
  3. $\pi_i=\lim_{t\to\inf}{P^{t}_{j,i}}=1/h_{i,i}$, where $h_{i,j}$ is the expected time for the Markov chain to go from state $i$ to $j$.
We need the Markov chain to be strongly connected because if we have a Markov chain with two strongly connected components, the chain will have two invariant distributions: one for each connected components. A Markov chain is aperiodic if the $\gcd$ of all path lengths to get from one state back to it is equal to $1$, i.e there is no period which dictates when we can be in a state. If the Markov chain is periodic, then it will not converge to $\pi$ because the distribution will flip flop its value based on the periodicity of the chain.

The theorem implies that the probability of being in state $i$ is the limiting probability of being in state $i$ infinitely far in the future and that this is independent of the start state $j$. So running the chain long enough, it says that the start state does not impact the long term behavior of the chain. Intuitively, if we expect to take $h_{i,i}$ steps on average to go from state $i$ to $i$, then we expect to be in state $i$ $1/h_{i,i}$ of the time.

Section 2: Random walks on undirected static graphs

A random walk on an undirected static graph $G$ is a sequence of moves of a particle between the vertices of $G$. If the particle is at vertex $u$, then it has a $1/\text{deg}(u)$ probability of moving to any of its neighbors. We can see that this is a Markov chain where $X_t$ is a random variable denoting the node which the particle is at at time $t$.

Since we want to avoid periodic Markov chains, we want to make sure that our graph we perform the random walk on does not prevent us from reaching a node if the number of time steps to do so is not a multiple of some constant.

Bipartite graphs are therefore no good, since if we are one side of the nodes, we cannot get to any nodes on the same side within an even number of time steps. This means that the random walk will be periodic and thus will not converge to the invariant distribution. This motivates the following theorem.

Thm 1: A random walk on an undirected graph $G$ is aperiodic if and only if $G$ is not bipartite.

proof. The forward direction was proven by the above paragraph, since we concluded that a bipartite graph is periodic. We now need to show that only bipartite graphs cause our random walks to be periodic. Suppose $G$ is not bipartite, then we can have cycles of odd length. Note that every node has a cycle of length $2$ (to a neighbor and back). This means that the Markov chain is aperiodic.

So if we work with a bipartite graph $G$, we know that a random walk on $G$ will converge to the invariant distribution. What is this invariant distribution? Well by looking at $\pi=\pi P$ and utilising the fact that $P_{ij}=1/\text{deg}(i)$, we get that $\pi_i=\text{deg}(i)/2m$. Intuitively, this makes sense, since the more edges a node has connected to it, the more ways the random walk can end up at the node.

(As an aside, this gave me the idea of studying the mixing time of random walks against an adaptive adversary. The adversary will make the graph bipartite in each round because this means the random walk will be periodic and so will not converge to the invariant distribution. But if we introduce smoothing, we can probably make it non-bipartite with high probability.)

We now introduce some interesting values of random walks which we want to study.

The hitting time, $h_{u,v}$ was introduced previously in terms of Markov chains. In the context of random walks, it is the expected time for the random walk to go from node $u$ to node $v$. The commute time is given by $h_{u,v} + h_{v,u}$.

The cover time is maximum over all starting vertices of the expected time for the random walk to visit all vertices.

Lemma 1: If $(u,v)\in E$ then $h_{u,v}+h_{v,u}\leq 2m$

We can justify this intuitively using the fact that if we have stationary distribution $\pi_i=\deg(i)/2m$, then this means for this fraction of the time we expect to be in state $i$. Therefore, on average the time between being in $u$ is $2m/\deg(i)$.

Lemma 2: The cover time is $\leq 2mn$

Since our graph is connected, there must exist a spanning tree $T$. We can cover every vertex by visiting the nodes in order of depth first search along $T$. Suppose this yields the cycle $u, w_1, w_2, ..., w_r, u$ of length $2(n-1)$. Then we have
$$\mathbb{E}[\text{steps to visit all vertices}]\leq \mathbb{E}[\text{steps }u\to w_1 \text{ and steps }w_1\to w_2\text{ and ... and steps }w_r\to u]$$
which by linearity of expectation and lemma $1$ is $\leq 2mn$.

We are now interested in another property of Markov chains, the mixing time. This tells us how quickly a Markov chain converges to the invariant distribution which is useful when sampling nodes in a graph for example. We will cover this in the next post.



No comments:

Post a Comment

AtCoder shananigans

Here is the problem , and below is the raw transcript of my thoughts. Obs: It is always optimal to use all $K$ moves. intuitive proof. While...