Monday, January 31, 2022

Hitting time analysis on smoothed dynamic graphs

The random walk can be a very useful tool in distributed algorithms. This is because it is lightweight, simple and local in the sense that no global information is needed. Additionally, since the process is random, it seems likely that it will intuitively be robust to dynamic changes in the graph. We explore to what extent this is true.

Theorem 1: There exists a dynamic graph $\mathcal{G}=(V, \mathcal{E})$ for which the maximum hitting time is $2^{\Omega(n)}$.

proof idea. We construct a dynamic graph $\mathcal{G}$ which is a star over $n$ vertices. Vertex $n$ remains a leaf and all others move clockwise around the center taking turns being the center vertex. Thus after exiting the center, a vertex will be a leaf for $n-2$ rounds. If we start the random walk from a leaf we have two options, move to center or stay at the same leaf. If we move to the center, the center then becomes a leaf, getting us nowhere. Thus we need to stay at the leaf until it is the vertex's turn to become the center. This means staying at the leaf for $n-2$ rounds, which occurs with probability $1/2^{n-2}$. Then we expect this sequence of moves to occur once every $2^{n-2}$ moves, giving the theorem.

This is quite discouraging since it seems as though random walks are no longer feasible as graph exploration methods in dynamic graphs. However, we can analyse hitting time under $k$-smoothing and see if this improves the hitting time.

We can show that this exponential lower bound on hitting time (and hence cover time) is fragile in the sense that if we add a small number of random edges to our worst-case graph, the resulting graph has polynomial hitting time.

Theorem 2: The maximum hitting time $H(u,v)$ of a random walk on any $k$-smoothed dynamic graph is $O(n^3/k)$.

proof. Suppose the random walk is at vertex $w$. If there is an edge $\{w,v\}$, then with probability at least $1-k/n^2$ it will remain an edge under smoothing. If there is no such edge, then an edge $\{w,v\}$ will be added by smoothing with probability at least $k/n^2$. Then if the edge $\{w,v\}$ is actually in the smoothed graph, we have at least $1/n$ probability of the random walk moving to $v$. Therefore, we have at least $k/n^3$ probability of moving to $v$ at each step. Therefore, we expect to reach $v$ in $O(n^3/k)$ steps.

We can also give a lower bound on the maximum hitting time.

Theorem 3: The maximum hitting time $H(u,v)$ of a random walk on a $k$-smoothed dynamic graph is $\Omega(n^{5/2}(\sqrt{k}\ln{n}))$.

The proof is omitted.


Sunday, January 30, 2022

Distributed random walk algorithm

Atish Das Sarma, Anisur Molla, and Gopal Pandurangan give an efficient algorithm for the construction of a distributed random walk from some source node $s$. The purpose being to perform this random walk of sufficient length so that the last node in the random walk is sampled according to the invariant distribution. 

They study this problem in the context of dynamic graphs $\mathcal{G}$ which are:
  • connected (otherwise the random walk will never converge to the invariant distribution),
  • $d$-regular (ensures invariant distribution remains the same for each $G_t\in \mathcal{G}$, it will remain the uniform distribution),
  • non-bipartite (ensures the random walk is aperiodic and therefore converges to the invariant distribution)
Additionally, they analyse the problem in the CONGEST$(\log^2{n})$ communication model. This is done to simplify the analysis of their algorithm. However, we aim to generalise their results to the CONGEST$(\log{n})$ model.

The main idea of their algorithm is to perform many shorter walks of length $\lambda$ (to be specified later) in parallel and later stitch them together to form the walk of the desired length. 

More specifically, each node $v$ begins by initiating $d=\deg(v)$ random walks of length $\lambda$ by forwarding $d$ "walker" tokens uniformly at random. Once all $nd$ random walks have completed, we have constructed many short random walks which are not linked, they are independent. In the next phase, we want to stitch them together.

To stitch, we first make the source node $s$ uniformly sample one of the $d$ random walks of length $\lambda$ that it created in the first phase.

We can flood the endpoints of each random walk across the graph in dynamic diameter time. Now each node knows the endpoints of each of its $d$ random walks of length $\lambda$. It uniformly picks one of them, $v$, deletes it from its list and tells $v$ to do the same as it just has. In this way, we construct random walks of length $\lambda, 2\lambda, 3\lambda,...$. This is great! Since we want a random walk of length $\tau$ (dynamic mixing time), we can repeat this process $\tau/\lambda$ times.

This seems like it works, but with a good-old pessimistic mindset, we can always find something that breaks. Imagine that some node $u$ is the endpoint of $>d$ short random walks. Then, we can have a situation where $u$ has depleted its collection of random walk endpoints it constructed in phase 1. It would therefore be helpful for us to say that a node $u$ is unlikely to be visited as a connector more than some fraction of $d$ times, which means that $u$'s endpoint supply will, in all likelihood, not be depleted. We will return to this point later, but for now, we ignore it and assume this problem will be fixed later.

Is the algorithm correct? For correctness, we need to show that the stitched random walk of length $\tau$ is a true random walk. Clearly each short random walk (before stitching) is a true random walk over the graphs $G_1, G_2, ..., G_{\lambda'}$ for some decent upper bound $\lambda'$ on $\lambda$. However, for phase 1 to complete in $\lambda'$ rounds we need to make sure that congestion through each edge is not too bad. We want to show that in any round of phase 1, we have at worst $\log{n}$ congestion with high probability (since we operate in CONGEST$(\log^2{n})$ model).

Fix an arbitrary directed edge $(u,v)$. We let $X_{i}$ be the event that "walker" token $1\leq i\leq d$ is forwarded by $u$ along edge $(u,v)$. Then the expected congestion along directed edge $(u,v)$ is
$$\mathbb{E}\left[\sum_{i=1}^{d}{X_i}\right] = d\cdot 1/d = 1$$
Now considering the other symmetric case $(v,u)$, we can conclude that each edge is expected to have $2$ messages in congestion. Using a Chernoff bound we get that
$$\mathbb{P}\left[\sum_{i=1}^{d}{X_i} \geq 4\log{n}\right] \leq 2^{-4\log{n}} = n^{-4}$$
Hence with high probability there will be no congestion and each short random walk will be able to extend its length by one step each round of the first phase.

Now to argue that the stitched random walk is indeed a random walk, we note that each short random walk is independent from every other since the stitching is done by uniformly sampling a short random walk at each chosen endpoint. Additionally, the stitching rounds do not affect the distribution of the chosen vertex since the stitching does not advance the random walk, i.e. only communications are performed, no random walk steps.

So the algorithm is correct (ignoring the flaw we pointed out above). Next post we will try to patch this hole by claiming that with high probability in a random walk of length $l$, we visit each endpoint of the random walk (stitching vertex) at most $O(d\sqrt{l}/\lambda)$ times. 

Saturday, January 29, 2022

What is the mixing time of a random walk?

We already know that a random walk is a Markov chain with the vertices as its states, and we know that an ergodic Markov chain converges to its invariant distribution. But we do not know how fast. The mixing time is a measure of how quickly it converges. We now formally define this concept.

Def 1: The variation distance between two distributions $D_1$ and $D_2$ on a countable state space $S$ is given by $$||D_1-D_2|| = \frac{1}{2}\sum_{x\in S}{|D_1(x) - D_2(x)|}$$

The factor of one half ensures that the variation distance remains between $0$ and $1$.

Def 2: Let $\pi$ be the invariant distribution of an ergodic Markov chain with state space $S$. Let $p_{x}^t$ represent the distribution of the state of the chain starting at state $x$ after $t$ steps. We define $$\Delta_x(t) = ||p^t_x-\pi||$$ and $$\Delta(t) = \max_{x\in S}{\Delta_x(t)}$$

So $\Delta_x(t)$ is the variation distance between the invariant distribution and $p^t_x$, and $\Delta(t)$ is the maximum of these. Now we can define the mixing time.

Def 3: Similarly we defined $$\tau_x(\epsilon) = \min\{t: \Delta_x(t)\leq \epsilon \}$$ and $$\tau(\epsilon)=\max_{x\in S}{\tau_x(\epsilon)}$$ where $\tau_x(\epsilon)$ is the first time when the variation distance between $p^t_x$ and the invariant distribution is $\leq \epsilon$ and $\tau(\epsilon)$ is the maximum of these. $\tau(\epsilon)$ is called the mixing time.

We say that a Markov chain is rapidly mixing if $\tau(\epsilon)$ is polynomial in $\log(1/\epsilon)$ and the size of the problem.

Monday, January 24, 2022

Random walks against adaptive dynamic graphs

[cite 1] have studied random walks on dynamic graphs against an oblivious adversary. They find that the hitting time of a random walk is $\Omega(2^n)$. However, [cite 2], apply smoothed analysis to this to show that this lower bound is very fragile in the sense that after $k$-smoothing, the bound is reduced to $\Omega(n^{5/2}/(\sqrt{k}\log{n}))$. However, all this assumes the adversary to be oblivious to the random walks location at each step. One could imagine that in some scenarios, the adversary may have knowledge of the random walks position at each step. This is why we are interested in studying random walks against an adaptive adversary.

Claim 1: The hitting  and cover time of a random walk is unbounded against an adaptive adversary.

proof. We construct a dynamic graph which will forcefully constrain the random walk to two nodes only. Thus the adversary can guarantee that the random walk will not hit the desired vertex.


Suppose the adaptive adversary knows the set 
$$S(r)=\{v\in V\text{ }|\text{ }v\text{ has been visited by the random walk in round} \leq r\}$$
and the node the random walk is at, at the start of the current round $r$, $u_r$. Then it will construct the dynamic graph $\mathcal{G}_{\text{bad}}=(V, \mathcal{E})$ where $V=\{v_1,v_2,...,v_n\}$ and
$$\mathcal{E}(r)=\left(\bigcup_{w\in S(r)\setminus \{u_r, v\}}{\{\{w, u_r\}\}}\right)  \cup \{u_r, v\} \cup \left(\bigcup_{w\in V\setminus S(r)}\{{\{v, w\}\}}\right)$$
where $v \in S(r)\setminus \{u_r\}$. We can prove by induction that the random walk will only explore two nodes in $\mathcal{G}_{\text{bad}}$.

However, this feels quite strict/contrived. Perhaps the hitting/cover time is only unbounded on a small class of adaptive dynamic graphs. However, $k$-smoothing can be used to reduce this to $O(n^3/k)$.

Project Goal

In this blog post I want to make it clear what the project objectives are and give the project an overarching goal.

Goal: To study distributed random walks, their properties and uses, in dynamic graphs.

Namely, we are interested in:
  • hitting time of a random walk in a dynamic graph.
  • cover time of a random walk in a dynamic graph.
  • mixing time of a random walk in a dynamic graph.
  • smoothed versions of these.
  • computing hitting, cover and mixing times on various dynamic graphs by simulation.
  • application of distributed random walks to solve $k$-gossip.
  • other applications of distributed random walks.
Perhaps we could look into various algorithms for page rank using random walks.

Thursday, January 20, 2022

Random walks on graphs

In order to better understand the random walk based algorithm of [cite 1] for $k$-gossip, I will be exploring the concept of random walks on both static and dynamic graphs.

Section 1: Markov Chains

A Markov chain can be thought of as a directed graph where edge $(u,v)$ has a probability label $P_{uv}$ denoting the probability that we move from node/state $u$ to node/state $v$. We can encode this directed graph in an adjacency matrix $P$. Where $P_{ij}$ is the probability label of the edge $(i,j)$ in the graph.

Equivalently, we can define Markov chains to be a set of random variables $\{X_t\}_{t\geq 0}$, where $X_t$ is the state of the Markov chain at time $t$. Then we can say that $P_{ij}=\mathbb{P}[X_t=j | X_{t-1}=i]$.

The defining characteristic of a Markov chain is that it is "memoryless", i.e. the value of our chain at time $t$ can be entirely dependent on the value of the chain at time $t-1$. So the history of state transitions before time $t-1$ is already captured by $X_{t-1}$. This is formally written as
$$\mathbb{P}[X_t=a_t | X_{t-1}=a_{t-1}, X_{t-2}=a_{t-2}, ..., X_0=a_0] = \mathbb{P}[X_t=a_t | X_{t-1}=a_{t-1}]$$
To get any one step transition probability ($P_{ij}$) we can just read from the matrix $P$, but what about a larger step transition probability? We want to compute $\mathbb{P}[X_t=j | X_0=i]$. We can write this as
$$\mathbb{P}[X_t=j | X_0=i] = \sum_{k}{\mathbb{P}[X_1=k | X_0=i] \cdot \mathbb{P}[X_t=j | X_1=k]} = \sum_{k}{P_{ik}\cdot \mathbb{P}[X_t=j | X_1=k]}$$
Therefore, we can inductively prove that $$\mathbb{P}[X_t=j | X_0=i] = P_{ij}^{t}$$
Now suppose that the probabilities of being in each state at time $t$ are given as a row vector $p(t)$. We can extend our result above to show that 
$$p(t)=p(t-1)\cdot P=p(0)\cdot P^t$$
It is natural to ask if $p(t)$ converges to some invariant/stationary distribution, which indicates that the Markov chain reaches some sort of equilibrium. Indeed we can compute this invariant distribution $\pi$ using the following
$$\pi=\pi P$$
The following theorem gives the necessary conditions for a Markov chain on $n$ states to converge to a unique invariant distribution.

Thm 1: Any finite, strongly connected and aperiodic Markov chain has the following properties:
  1. the chain has a unique invariant distribution $\pi=(\pi_0,\pi_1, ..., \pi_n)$
  2. for all $i$ and $j$, in the limit $\lim_{t\to\inf}{P^{t}_{j,i}}$ exists and it is independent of $j$
  3. $\pi_i=\lim_{t\to\inf}{P^{t}_{j,i}}=1/h_{i,i}$, where $h_{i,j}$ is the expected time for the Markov chain to go from state $i$ to $j$.
We need the Markov chain to be strongly connected because if we have a Markov chain with two strongly connected components, the chain will have two invariant distributions: one for each connected components. A Markov chain is aperiodic if the $\gcd$ of all path lengths to get from one state back to it is equal to $1$, i.e there is no period which dictates when we can be in a state. If the Markov chain is periodic, then it will not converge to $\pi$ because the distribution will flip flop its value based on the periodicity of the chain.

The theorem implies that the probability of being in state $i$ is the limiting probability of being in state $i$ infinitely far in the future and that this is independent of the start state $j$. So running the chain long enough, it says that the start state does not impact the long term behavior of the chain. Intuitively, if we expect to take $h_{i,i}$ steps on average to go from state $i$ to $i$, then we expect to be in state $i$ $1/h_{i,i}$ of the time.

Section 2: Random walks on undirected static graphs

A random walk on an undirected static graph $G$ is a sequence of moves of a particle between the vertices of $G$. If the particle is at vertex $u$, then it has a $1/\text{deg}(u)$ probability of moving to any of its neighbors. We can see that this is a Markov chain where $X_t$ is a random variable denoting the node which the particle is at at time $t$.

Since we want to avoid periodic Markov chains, we want to make sure that our graph we perform the random walk on does not prevent us from reaching a node if the number of time steps to do so is not a multiple of some constant.

Bipartite graphs are therefore no good, since if we are one side of the nodes, we cannot get to any nodes on the same side within an even number of time steps. This means that the random walk will be periodic and thus will not converge to the invariant distribution. This motivates the following theorem.

Thm 1: A random walk on an undirected graph $G$ is aperiodic if and only if $G$ is not bipartite.

proof. The forward direction was proven by the above paragraph, since we concluded that a bipartite graph is periodic. We now need to show that only bipartite graphs cause our random walks to be periodic. Suppose $G$ is not bipartite, then we can have cycles of odd length. Note that every node has a cycle of length $2$ (to a neighbor and back). This means that the Markov chain is aperiodic.

So if we work with a bipartite graph $G$, we know that a random walk on $G$ will converge to the invariant distribution. What is this invariant distribution? Well by looking at $\pi=\pi P$ and utilising the fact that $P_{ij}=1/\text{deg}(i)$, we get that $\pi_i=\text{deg}(i)/2m$. Intuitively, this makes sense, since the more edges a node has connected to it, the more ways the random walk can end up at the node.

(As an aside, this gave me the idea of studying the mixing time of random walks against an adaptive adversary. The adversary will make the graph bipartite in each round because this means the random walk will be periodic and so will not converge to the invariant distribution. But if we introduce smoothing, we can probably make it non-bipartite with high probability.)

We now introduce some interesting values of random walks which we want to study.

The hitting time, $h_{u,v}$ was introduced previously in terms of Markov chains. In the context of random walks, it is the expected time for the random walk to go from node $u$ to node $v$. The commute time is given by $h_{u,v} + h_{v,u}$.

The cover time is maximum over all starting vertices of the expected time for the random walk to visit all vertices.

Lemma 1: If $(u,v)\in E$ then $h_{u,v}+h_{v,u}\leq 2m$

We can justify this intuitively using the fact that if we have stationary distribution $\pi_i=\deg(i)/2m$, then this means for this fraction of the time we expect to be in state $i$. Therefore, on average the time between being in $u$ is $2m/\deg(i)$.

Lemma 2: The cover time is $\leq 2mn$

Since our graph is connected, there must exist a spanning tree $T$. We can cover every vertex by visiting the nodes in order of depth first search along $T$. Suppose this yields the cycle $u, w_1, w_2, ..., w_r, u$ of length $2(n-1)$. Then we have
$$\mathbb{E}[\text{steps to visit all vertices}]\leq \mathbb{E}[\text{steps }u\to w_1 \text{ and steps }w_1\to w_2\text{ and ... and steps }w_r\to u]$$
which by linearity of expectation and lemma $1$ is $\leq 2mn$.

We are now interested in another property of Markov chains, the mixing time. This tells us how quickly a Markov chain converges to the invariant distribution which is useful when sampling nodes in a graph for example. We will cover this in the next post.



Sunday, January 16, 2022

$k$-gossip with smoothing

We have so far analysed gossip on connected dynamic graphs, however, we have not yet looked at this problem under smoothing. We use the analysis conducted by [cite1] and [cite2] as a blackbox to derive a $s$-smoothed upper bound for $k$-gossip.

Here we assume that each node in our dynamic graph is given as part of the input $n$ and $k$, we also assume there is some agreed upon ordering of the tokens. This leads to a very simple $O(nk)$ algorithm without smoothing.

Algorithm for each node:

  • $i=1$
  • For $i$ from $1$ to $k$ do
    • For $n-1$ rounds broadcast the $i^{\text{th}}$ smallest token the node knows.
This works because of the connectivity of the dynamic graph ensuring that each round a node must learn a token. Because this protocol breaks up $k$-gossip into $k$ phases of flooding a single token, we can use results from smoothed flooding for $k$-gossip.

We know that flooding a token on a connected dynamic graph under $s$-smoothing against an oblivious adversary terminates within $O(n^{2/3}\log{n}/s^{1/3})$ with probability $1-\frac{1}{n}$. So instead of iterating $n-1$ times we can iterate $O(n^{2/3}\log{n}/s^{1/3})$ times.

Letting $F_i$ be the event that in phase $i$, flooding does not terminate in $O(n^{2/3}\log{n}/s^{1/3})$ round. Then $\mathbb{P}[F_i]=\frac{1}{n}$. The probability that any $F_i$ occurs for $1\leq i \leq k$ is then union bounded
$$\mathbb{P}[\bigcup_{i=1}^{k}{F_i}]\leq \sum_{i=1}^{k}{F_i}=\frac{k}{n}$$
Therefore, the probability that none of the phases fail is at least $1-\frac{k}{n}$. This is not with high probability since $k\leq n$. Thus we need to increase the upper bound for flooding so that the probability of success is at least $1-\frac{1}{n^2}$.

In flooding, we needed an edge between two sets, each of size $\geq r$, where $r$ is the number of rounds in one of the algorithm's phases. This happens with probability at least $c_1sr^2/n^2$ for constant $c_1$ and under $s$-smoothing. Over $r$ rounds, the probability of this not happening in any of the rounds is at most $(1-c_1sr^2/n^2)^r$. We can use the Chernoff bound for random Bernoulli variables to get the following upper bound $(1-c_1sr^2/n^2)^r \leq e^{-c_1r^3s/n^2}$.

We are union bounding twice: once over the $n$ vertices and once over the $k\leq n$ tokens. Therefore, we need flooding to fail with probability at most $1/n^3$. Therefore we need to pick a value for $r$ s.t.
$$e^{-c_1r^3s/n^2}\leq 1/n^3$$
solving this for $r$ yields that
$$r\geq n^{2/3}(3\ln{n})^{1/3}/(c_1s)^{1/3}$$

Result Oblivious: $k$-gossip terminates within $O(kn^{2/3}(\log{n}/s)^{1/3})$ rounds with probability at least $1-1/n$ against an oblivious adversary under $s$-smoothing.

We can extend this result to the case of an adaptive adversary. Here flooding requires $O(n\sqrt{\log{n}/s})$ rounds to terminate. In a previous blog post, I went over the reason why the probability of a helpful edge being added is at least $c_1sr/n^2$, which is required to occur at some point within $r$ rounds for an arbitrary node $v$ to receive the flooded message. Thus the probability that flooding fails is at most $(1-c_1sr/n^2)^{r}$. Using a Chernoff bound we can deduce that
$$(1-c_1sr/n^2)^{r}\leq e^{-c_1r^2s/n^2}$$
We want $e^{-c_1r^2s/n^2} \leq 1/n^3$ and so solving for $r$ we get
$$r\geq n\sqrt{3\ln{n}/c_1s}$$

Result Adaptive: $k$-gossip terminates within $O(kn\sqrt{\log{n}/s})$ rounds with probability at least $1-1/n$ against an oblivious adversary under $s$-smoothing.

Note that the adaptive result improves the $O(nk)$ upper bound, when $\sqrt{\log{n}/s}\leq 1$, which happens when $n\leq e^s$

AtCoder shananigans

Here is the problem , and below is the raw transcript of my thoughts. Obs: It is always optimal to use all $K$ moves. intuitive proof. While...