Wednesday, November 24, 2021

Revised project aims

Quick project goal:The main goal of this project is to study the information dissemination task of $k$-gossip under smoothing. $k$-gossip is a fundamental primitive in distributed networks and can be used to enforce data replication which has applications in the very popular blockchain networks. I want to see how $k$-gossip's lower and upper bounds are affected in dynamically changing networks if we introduce smoothing to the network graph for each round of synchronous communication. This will give us a better understanding of how efficient our current algorithms are on "realistic" inputs for $k$-gossip.

The revised project aims are as follows:

  1. Present known unsmoothed and smoothed lower and upper bounds for flooding and the maximum hitting time of a random walk against oblivious adversaries, as an introduction to the techniques used for the later analysis of $k$-gossip.
  2. Present/Develop unsmoothed lower and upper bounds for $k$-gossip against an oblivious adversary.
  3. Present the known unsmoothed lower and upper bounds for $k$-gossip against an adaptive adversary.
  4. Use 2 to develop smoothed lower and upper bounds for $k$-gossip against an oblivious adversary.
  5. Use 3 to develop smoothed lower and upper bounds for $k$-gossip against an adaptive adversary.
Aims 4 and 5 will take the bulk of my time since these would be new results if completed successfully. If I fail to complete 3 and 4, I may replace them with the implementation of the best known algorithm for $k$-gossip on dynamic graphs against an oblivious/adaptive adversary. The program could then compare the runtime of its unsmoothed input to its smoothed input.

Progress made:
  • Presented $\Omega(n^{2/3}/k^{1/3})$ smoothed lower bound for flooding against an oblivious adversary. (part of aim 1)
  • Presented $\Omega(n\log{k})$ unsmoothed lower bound for $k$-gossip against an adaptive adversary. (part of aim 3)
Plan for Christmas holiday:
  1. Complete aim 1.
  2. Complete aim 2.
  3. Complete aim 3.
Progress reflection: I am currently behind schedule because I have not managed my workload for this term well. I also did not allocate enough time in term 1 for just gaining the relevant background to tackle aims 4 and 5. My goal is to make up for my lateness over Christmas where I will be able to focus purely on my project.

This will mean term 2 will be entirely focused on the hardest aims: 4 and 5. The Christmas holidays will be used as a way to complete all preliminaries needed to give me the background to tackle 4 and 5.

Tuesday, November 23, 2021

$\Omega(nk/\log{n})$ lower bound for $k$-gossip

In an earlier post, I struggled through the proof of the $\Omega(n\log{k})$ lower bound for the online $k$-gossip problem in connected dynamic graphs. I plan to extend this result by covering the proof of the $\Omega(nk/\log{n})$ lower bound for the same problem given by [Dut+11].

The adversary's strategy is very similar to that for the $\Omega(n\log{k})$ lower bound. The adversary connects all free edges together so that we are left with the connected components $C_1, C_2, ..., C_l$. Then arbitrary representatives from each component: $v_1, v_2, ..., v_l$ are chosen and connected in a line. This graph $G_r$ for round $r$ will be the adversary's chosen graph. Our aim is to argue that any token forwarding algorithm for $k$-gossip has lower bound $\Omega(nk/\log{n})$ using $G_r$.

The first new concept is the idea of a half-empty configuration which is used to bound the progress made in each round. 

Def 1: Given a sequence of nodes $v_1, v_2, ..., v_l$, we say that this sequence is half empty in round $r$ with respect to a sequence of tokens $t_1, t_2, ..., t_l$ if at the start of round $r$ we have for all $1\leq i, j \leq l, i\not= j$ that either $v_i$ is missing $t_j$ or $v_j$ is missing $t_i$.

Our proof strategy will be to show that the half-empty configurations must have small size, which limits the progress in each round to a small amount.

Lemma 1: If $m$ useful token exchanges occur in round $r$, then there exists a half-empty configuration of size at least $m/2 + 1$ at the start of round $r$ in $G_r$.

Note that a useful token exchange is one which occurs along a non-free edge.

proof. Since we have $m$ useful token exchanges which must occur along non-free edges, we know that there must be at least $m/2$ non-free edges since each non-free edge contributes at worst two useful token exchanges. By the construction of $G_r$, we have that a non-free edge must be between two nodes $(u, v)$ in different connected components in $G_r \setminus \{\text{non-free edges}\}$. Since we have at least $m/2$ non-free edges, we have at least $m/2 + 1$ connected components in $G_r \setminus \{\text{non-free edges}\}$.

Now we want to show that for any $u, v$ described above, we have two tokens $t_u$ and $t_v$ where either $u$ is missing $t_v$ or $v$ is missing $t_u$. Suppose for contradiction that $u$ knows $t_v$ and $v$ knows $t_u$ for any pair of tokens $(t_u, t_v)$. Then $(u, v)$ must be a free edge, which is a contradiction.

We can use this idea on every pair $(v_i, v_j)$ for $1\leq i < j \leq l$ to form a half-empty sequence of length $m/2 + 1$.

Lemma 2: If a sequence $v_1, v_2, ..., v_l$ of nodes is half-empty with respect to $t_1, t_2, ..., t_l$ at the start of round $r$, then $v_1, v_2, ..., v_l$ is half-empty with respect to $t_1, t_2, ..., t_l$ at the start of any round $r' < r$.

proof. Suppose for contradiction that in round $r' < r$, $v_1, v_2, ..., v_l$ is not half-empty with respect to $t_1, t_2, ..., t_l$. Then for some pair $(v_i, v_j)$ we must have that $v_i$ knows $t_j$ and $v_j$ knows $t_i$. If this is true in round $r'<r$, then it is still true in round $r$. A contradiction.

Now we prove that for a specific initial token distribution, the $\Omega(nk/\log{n})$ lower bound holds.

First note that if $k \leq c\log{n}$ for a constant $c$, then the bound is just $\Omega(cn\log{n}/\log{n})=\Omega(n)$, which is the lower bound for $k=1$ under our adversary's construction $G_r$. So we can assume that $k > c\log{n}$.

Theorem 1: From an initial token distribution in which each node has each token independently with probability $3/4$, any online token-forwarding algorithm will need $\Omega(kn/\log{n})$ rounds to complete with high probability against a strong adversary.

By the way, when we say with high probability (w.h.p), we mean with probability $\geq 1 - c/n^{\alpha}$ from constants $c, \alpha > 0$.

proof strategy. If we can get a lower bound $X$, on the number of tokens each node is missing w.h.p before the first round, and an upper bound $U$, on the number of useful token exchanges across each round, then we must have at least $nX/U$ rounds to complete $k$-gossip. 

proof. We first get an upper bound on the number of useful token exchanges across each round. Naturally, if we can show that the largest half-empty sequence is small w.h.p, then we also have a small number of useful exchanges and thus missing tokens. 

Let $E_l$ denote the event that there exists a half-empty configuration of size $l$ at the start of the first round. We upper bound the probability of $E_l$ occurring.

$$\mathbb{P}[E_l] \leq {n \choose l} \cdot \frac{k!}{(k-l)!}\cdot \left(\frac{1}{2}\right)^{{l \choose 2}} \leq n^l \cdot k^l \cdot 2^{-l(l-1)/2} \leq \frac{2^{2l\log_2{n}}}{2^{l(l-1)/2}}$$

i.e. we have ${n \choose l}$ ways to select $l$ nodes for a half-empty configuration, $\frac{k}{(k-l)!}$ ways to select the corresponding token sequence, and for each pair $(v_i, v_j)$ in the half-empty configuration a $\leq 1/2$ probability that $v_i$ is missing $t_j$ or $v_j$ is missing $t_i$.

If we let $l=5\log_2{n}$, then we get $\mathbb{P}[E_l] \leq 1/2^{(5/2)\log_2{n}(\log_2{n}-1)} = 1/n^{(5/2)(\log_2{n}-1)} \leq 1/n^2$. Note that if a half-empty configuration of size $l$ exists, then we also have half-empty configurations of size $1, 2, ..., l$. Thus $1-\mathbb{P}[E_l] = \mathbb{P}[]$we have probability at least $1-1/n^2$ that ... to be continued.

[Dut+11] Chinmoy Dutta et al. “Information Spreading in Dynamic Networks”. In: CoRR abs/1112.0384 (2011). arXiv: 1112.0384. url: http://arxiv.org/abs/1112.0384.

Friday, November 12, 2021

First proper Codeforces round

This was my first Codeforces round (Div2 754) where I competed live. It was very fun and I solved the first three problems. My main issue was not the actual solutions but mainly the implementations. I spent more time debugging than thinking which was a bit of a pain since I just wanted to get on with the harder problems. I'm going to give brief solutions for the first three problems.

Problem A: You are given three positive integers $a_1, a_2$ and $a_3$ and a function $d(a_1, a_2, a_3) = |a_1 + a_3 -2\cdot a_2|$. We can pick any two of the three numbers and add $1$ to one of them and subtract $1$ from the the other. The problem asks for the minimum value of $d(a_1, a_2, a_3)$ that can be achieved given the allowed move.

The idea is that we can only increase/decrease $d(a_1, a_2, a_3)$ by $3$. So we just need to see what $a_1+a_3-2\cdot a_3$ mod $3$ is and this is our answer.

Problem B: We are given a binary string $b$ of length $n$. To do this we are allowed to do the following two things which counts as one move:

  1. Choose a subsequence of any length such that the elements are in non-increasing order.
  2. Reverse the chosen subsequence.
We want to find the minimum number of moves needed to sort the binary string in non-decreasing order.

The first thing to notice is that if we have $x$ $1'$s in $b$, then the last $b$ bits should be all $1'$s. Suppose the last $x$ bits have $y$ $0's$ within them. Then there must be $y$ $1'$s preceding the $x$ last bits. We can select all $y$ $1'$s preceding the last $x$ bits and then all $y$ $0'$s within the last $x$ bits as our subsequence and reverse it. This subsequence has non-increasing elements and will sort $b$ in non-decreasing order. We have an $O(n)$ solution.

Problem C: We are given a string $s$ of length $n$ consisting of only the characters $a$, $b$ or $c$. We want to find the length of the smallest substring of $s$ which satisfies:
  1. Substring has length $\geq 2$.
  2. $a$ occurs strictly more times than $b$.
  3. $a$ occurs strictly more times than $c$.
The only substring of length $2$ satisfying these conditions is $aa$. If this is not present then we know that any substring satisfying these conditions must have the form $(a(b\cup c)^{*})^* \cup a$. The possible substrings of length $3$ are: $aba, aca$. The substrings of length $4$ are $abca, acba$. A substring of length $5$ or $6$ cannot exist satisfying these conditions since we can't have something like $ababa$ or $abcaba$ since it contains the smaller substring $aba$ which works too. Importantly, there is are $2$ substrings of length $7$, which are $abbacca$ and $accabba$. Anything larger would necessarily include a smaller substring within it. Thus we just check for these and output the smallest. Runs in $O(n)$.

Although I didn't solve $D$ in the contest, I still had some ideas for it. I will present these below.

Problem D: We are playing a game on a tree between two players $A$ and $B$, where $A$ moves first. $A$ first chooses a node in the tree to place a token on. For the remaining turns each player can move the token from $u$ to $v$ iff
  1. $\{u,v\}$ is an edge,
  2. $v$ has not already been visited, and
  3. $u\oplus v \leq \min(u,v)$.
$A$ wins if it is $B$'s turn and $B$ cannot make a move (inverse for $B$ winning). We are asked to assign one number from $\{1, 2, ..., n\}$ for each node in the tree such that the number of starting nodes where $A$ wins is maximized.

My first instinct was to check when exactly $u\oplus v \leq \min(u, v)$. I observed that $u \oplus v \leq \min(u,v)$ iff $u$ and $v$ have the same number of bits in their binary representation. The proof is easy. I partitioned the set of integers from $1$ to $n$ into sets where any pair in the same set satisfies $u\oplus v \leq \min(u,v)$.

$$\{1\}$$
$$\{2, 3\}$$
$$\{4, 5, 6, 7\}$$
$$\{8, 9, 10, 11, 12, 13, 14, 15\}$$
$$...$$

A natural strategy is to try to label the nodes such that for any $\{u,v\}$ edge in the tree, we have $u\oplus v > \min(u,v)$, i.e. $u$ and $v$ are not in the same partition as described above. I played around with some examples and for the examples I tried, I could always find a labeling where this was true, hence a labeling where $A$ wins for all nodes. Importantly, as far as I could tell, the line graph seems like the hardest graph to do this on since we have $n-1$ edges which need to have nodes labelled from different sets, as opposed to other graphs where a node can have multiple children, which can have labels from the same set.

At this point I was convinced that we could always label the nodes such that $A$ wins on all of them. To generate this labeling, it felt like a recursive procedure could work, starting from the leaves, and working inwards. Sadly, I ran out of time.

Reflection:

I solved $A$ a bit slowly, but stayed calm and solved it smoothly. $B$ went very well, except I misread the problem output which cost me a lot of time (like over $20$ mins), I actually got the solution really quickly. $C$ was annoying because of all the tricky cases. I found all of them eventually though. A good contest for me.

Sunday, November 7, 2021

$\Omega(n^{2/3}/k^{1/3})$ smoothed lower bound for flooding.

In the static flooding problem, we are given a graph $G=(V,E)$ and one node $v\in V$ wants to dissipate some information to all other nodes in $G$. We consider this problem in the distributed dynamic setting.

It is known that flooding requires $\Omega(n)$ rounds to complete in static graphs (consider the line graph). But we can parameterize this by the static graph diameter $D$ instead since most graphs have $n\geq D$. In this case we need $\Theta(D)$ rounds.

However, in the dynamic graph setting we can construct a graph $\mathcal{G}$ where the diameter is constant, but we still require $\Omega(n)$ rounds for flooding to complete. So we can no longer parameterize the round complexity in terms of $D$.

We will aim to give a smoothed lower bound for flooding, where the proof presented here is taken from [Din+15].

Section 1: The spooling graph. 

We want a dynamic graph where even though the diameter is constant, flooding requires $\Omega(n)$ rounds to complete. Such a graph is called the spooling graph.

For $1\leq r\leq n-1$, the $r$-spooling graph is a graph consisting of a left star of nodes $\{1, ..., r\}$ centered at $r$ and a second right star of nodes $\{r+1, ..., n\}$ centered around $r+1$ with the edge $\{r, r+1\}$ connecting the two stars. Clearly this graph has constant diameter $3$ across rounds.

Consider the case where node $1$ starts with the message to be flooded. Then we need $\Omega(n)$ rounds for flooding to complete since only one node learns the message per round.

Shows first 3 rounds of flooding from node 1 on the spooling graph.

Our task is to prove that the $\Omega(n)$ lower bound can be improved (with some probability) by smoothing our worst-case dynamic graph.

Section 2: The smoothed lower bound.

For our lower bound, we assume that the process of $k$-smoothing can only add edges. This is because if we need $\Omega(R)$ rounds when only adding edges, then we certainly need $\Omega(R)$ edges when both adding and removing edges.

We want to show that given a smoothed spooling graph, we need at least $R$ rounds for flooding to complete with some probability. Then we can say that flooding has a $\Omega(R)$ smoothed lower bound. For the flooding to last at least $R$ rounds, we want to show that it is likely that in the first $R-1$ rounds at least one node will not yet have received the message. A node $v$ can learn the flooded message in one of the following ways in round $r$:
  1. $v$ is numbered $r+1$ and will therefore learn the message from the node numbered $r$,
  2. there is a smoothed edge between some node in the left star and $v$ in the right star,
  3. $r+1$ knows the message and $v$ is in the right star.
If case $(1)$ occurs, we don't really mind, since only a single node will learn the message that round. This cannot be avoided and all nodes $x\in[1, R]$ will learn the message in this way. 

On the other hand, if case $(3)$ occurs even once, say in round $r$, then we know that by the end of round $r$ all nodes will have learned the message. Thus case $(3)$ is said to fail if it occurs even once.

If case $(2)$ happens too often, then we risk all nodes in the right star learning the message. Thus we say that case $(2)$ fails if more than $R$ nodes learn the message in this way. We must be careful to eventually pick an $R$ which satisfies $2R<n$ for sufficiently large $n$, so that at least one node will not have received the message if case $(2)$ does not fail.

Let's try to upper bound the probability that $(3)$ happens in the first $R-1$ rounds.

Obs 1: For $(3)$ to happen, we need some node numbered $x\in[1, R]$ to learn the message early (i.e. in a round $r<x-1$) via a smoothed edge.

Consider each round $r<R$ where case $(2)$ and $(3)$ have not failed yet. Then we define the set $S$ to be the set of edges between a node in the left star (has the message) to a node in $[r+2, R]$. Note that $|S|=O(R^2)$.

We use the following lemma (discussed in previous blog post) to help us bound the probability that a smoothed edge as described in observation $1$ is added. 

Lemma 1: There exists constant $c_3 > 0$ such that the following holds. Consider any graph $G = (V, E)\in\text{allowed}(\mathcal{G}_{\text{conn}})$. Consider also any set $S$ of edges and smoothing value $k \leq n/16$ such that $S \cap E = \emptyset$. Then with probability at most $c_3k|S|/n^2$ , the $k$-smoothed graph $G'$ of $G$ contains an edge from $S$.

Thus, the probability of adding an edge between a node with the message and a node $x$ as described in observation $1$ is $\leq c_3k|S|/n^2=O(kR^2/n^2)$. We then take a union bound over the $R$ rounds, so that the probability of $(3)$ occurring is $O(kR^3/n^2)$. If we let $R=\alpha n^{2/3}/k^{1/3}$ for some $\alpha\leq 1$, then we get $O(\alpha^3)$, which for sufficiently small $\alpha$ is at most $1/4$. 

Note also that for sufficiently large $n$, we have $2R=2\alpha n^{2/3}/k^{1/3} < n$, which was mentioned above to be necessary.

Now we want to upper bound the probability of case $(2)$ failing, i.e. the probability that more than $R$ nodes learn the message by case $(2)$. If we get the expected number of nodes learning the message by case $(2)$ over the $R-1$ rounds, then we can use Markov's inequality to upper bound the probability of case $(2)$ failing.

Consider each round $r<R$ where cases $(2)$ and $(3)$ have not yet failed. Therefore, up to $2R$ nodes know the message. Fix a node in the right star, say $v$, then let $S=\{\{u, v\} | u\in\text{ left star }\}$. By lemma 1, the probability that node $v$ learns the message by case $2$ is $\leq c_3k(2R)/n^2=O(kR/n^2)$. Now, we introduce the following indicator variable.
$$X_u = 1 \text{ if }u\in\text{ right star and learns the message by case } (2) \text{ in round } r \text{ otherwise } 0$$
Then the expected number of nodes in the right star which will learn the message by case $(2)$ in round $r$ is given by
$$\mathbb{E}[\sum_{u\in\text{ right star}}{X_u}] = \sum_{u\in\text{ right star}}{E[X_u]} \leq O(n)\cdot O(kR/n^2)=O(kR/n)$$
Summing over the $R-1$ rounds and letting $R=\alpha n^{2/3}/k^{1/3}$, we have $O(kR^2/n)=O(\alpha^2kn^{1/3}k^{1/3})=O(\alpha R)$ expected nodes learning the message by case $(2)$, where the last equality holds for $k=O(\sqrt{n})$.

For sufficiently small $\alpha$, we have $R/4$ expected total number of nodes that learn the message by case $(2)$. We can now apply Markov's inequality.

Markov's Inequality: Let $X$ be a non-negative random variable. For every $a>0$ we have
$$\mathbb{P}(X\geq a) \leq \frac{E[X]}{a}$$

Hence we have 
$$\mathbb{P}(\text{total number of nodes that learn the message by case } (2)\geq R) \leq E[\text{ total number of nodes that learn the message by case } (2)]/R \leq R/4R=1/4$$
Thus we fail case $(2)$ with probability at most $1/4$.

Putting everything together, we have at most $1/4+1/4=1/2$ probability of failing either case $(2)$ or case $(3)$. Therefore, we have probability at least $1/2$ of both case $(2)$ and case $(3)$ passing.

Theorem: Given a $k$-smoothed $n$-vertex spooling graph, with $k \leq \sqrt{n}$ and sufficiently large $n$, with probability at least $1/2$ flooding does not complete before round $\Omega(n^{2/3}/k^{1/3})$.

[Din+15]    Michael Dinitz et al. “Smoothed Analysis of Dynamic Networks”.
In: Proceedings of the 29th International Symposium on Distributed
Computing - Volume 9363. DISC 2015. Tokyo, Japan: Springer-
Verlag, 2015, pp. 513–527. isbn: 9783662486528. doi: 10.1007/
978- 3- 662- 48653- 5_34. url: https://doi.org/10.1007/978-3-662-48653-5_34.



Saturday, November 6, 2021

Just a cool theorem about monotonic subequences.

Below is a pretty neat theorem which basically states that sufficiently long sequences must have long monotonic subsequences.

Erdos-Szekeres theorem:
Given any sequence of $N^2+1$ real numbers $n_1, n_2, ..., n_{N^2+1}$, there must be a monotonically increasing subsequence of length $N$ or a monotonically decreasing subsequence of length $N$.

proof. Label each $n_i$ with a pair $(a_i, b_i)$ where $a_i$ is the length of the largest monotonically increasing subsequence ending at $n_i$ and $b_i$ is the length of the largest monotonically decreasing subsequence ending at $n_i$.

Now suppose for contradiction that the longest monotonically increasing and decreasing subsequences of $n_1, n_2, ..., n_{N^2+1}$ have length $\leq N$.

Obs 1: We have $N^2$ possible pairs $(a_i, b_i)$.

proof. We assumed that each $a_i, b_i \leq N$, the observation then follows.

Obs 2: For each $1\leq i<j\leq N^2+1$ we have that $(a_i, b_i) \not= (a_j, b_j)$.

proof. Fix any $1\leq i<N^2+1$, with pair $(a_i, b_i)$. For any $j>i$ we have two possible cases:
  1. Either $n_i \leq n_j$, or
  2. $n_i\geq n_j$.
Case $(1)$ implies that $a_i\not= a_j$ since we can extend the largest monotonically increasing subsequence ending at $n_i$ by adding $n_j$ on the end. Similarly, case $(2)$ implies that $b_i\not= b_j$.

Thus for the sequence $n_1, n_2, ..., n_{N^2+1}$, observation $2$ implies there are $N^2+1$ distinct pairs $(a_i, b_i)$, but observation $1$ states that we only have $N^2$ pairs to choose from. Thus there must be two identical pairs by the pigeonhole principle, a contradiction. This completes the proof.  

Properties of Smoothed Connected Graphs

We can imagine that smoothing a dynamic graph could introduce edges which are helpful, i.e. they allow are algorithms to terminate faster. Conversely, they could remove or not add edges which allow are algorithms to terminate fast. Since smoothing is probabilistic, we want to see with what probability smoothing is helpful or not. This can help us design probabilistic algorithms for smoothed dynamic graphs. We give a proof sketch for a useful lemma given in [Din+15] and include some other related lemmas for completeness.

Lemma 1: Consider any graph $G\in \text{allowed}(\mathcal{G}_{\text{conn}})$ and any non-empty set $S$ of potential edges to be added to $G$ and smoothing value $k\leq n/16$ with $k|S| \leq n^2/2$. Then with probability at least $c_1 k |S|/n^2$, the $k$-smoothed graph $G'$ of $G$ contains at least one edge from $S$, for some constant $c_1 > 0$.

So what is this lemma saying? Suppose that we select the set of edges $S$ to be a set of helpful edges, i.e. having them would help our algorithms terminate faster. Then this lemma claims that it is fairly probable for $G'$ to include at least one of these helpful edges.

proof sketch. We first note that $k$-smoothing $G$ is the same as replacing it with a graph $G'$ sampled uniformly from $\text{editdist}(G, k) \cap \text{allowed}(\mathcal{G}_{\text{conn}})$.

If we denote $B$ as the event where $G'$ is connected and $A$ as the event where $G'$ contains at least one edge from $S$, then we are looking for $\mathbb{P}(A|B)$. So we want to lower bound this probability. We can do this using the following inequality $\mathbb{P}(A|B) \geq \mathbb{P}(A \cap B)$.

We split the proof into two cases. Either $G$ contains an edge $e\in S$ or it does not.

Case 1: $G$ contains an edge $e\in S$.

We want to find the probability that $e$ remains in $G'$, but also we need $G'$ to be connected. Thus consider "any old" spanning tree of $G$, call it $T$. We denote the event that $G'$ contains $T\cup \{e\}$ as $C$. If $C$ occurs then $G'$ is connected and includes an edge from $S$. Hence $\mathbb{P}(A \cap B) \geq \mathbb{P}(C)$. Our new aim is then to lower bound $\mathbb{P}(C)$.

Consider the process of $k$-smoothing $G$. It is essentially the same as choosing up to $k$ distinct edges from the ${n \choose 2}$ possible edges to toggle. When toggling the first of the $k$ edges we have an $n/{n \choose 2}$ chance of toggling an edge in $T\cup \{e\}$. For the second toggle we have at most $n/({n \choose 2}-1)$ probability of toggling an edge in $T\cup \{e\}$. So taking a union bound over the events that the $i^{\text{th}}$ edge toggled is part of $T\cup \{e\}$ we have at most $\sum_{i=0}^{k-1}{n/({n \choose 2}-i)}$ probability of toggling at least one edge in $T\cup \{e\}$.

Obs 1: $\sum_{i=0}^{k-1}{n/({n \choose 2}-i)} \leq 2kn/{n \choose 2}$ for $k \leq {n \choose 2}/2$.

Hence we have $\mathbb{P}(\text{not } C) \leq 2kn/{n \choose 2}$. Thus $\mathbb{P}(C) \geq 1-2kn/{n \choose 2} \geq 1/2$ for $k\leq n/16$.

Case 2: $G$ does not contain an edge $e\in S$.

As before, let $T$ be any spanning tree of $G$ and let $D$ denote the event that $G'$ contains $T$. Now $\mathbb{P}(A \cap B) \geq \mathbb{P}(A\cap D)=\mathbb{P}(A|D)\mathbb{P}(D)$. Observe that $\mathbb{P}(D) \geq \mathbb{P}(C) \geq 1/2$ since the scenario where $G'$ contains $T$ includes the scenario where $G'$ contains $T\cup \{e\}$. Hence $\mathbb{P}(A\cap D)\geq\mathbb{P}(A|D)/2$. Finally, we notice that if we are given that $D$ occured then we have ${n \choose 2}-(n-1)$ edges to choose $e$ from, whereas if we are not given $D$, then we have ${n \choose 2}$ edges to choose $e$ from. Thus $\mathbb{P}(A|D)\geq\mathbb{P}(A)$, and so we have the inequality chain
$$\mathbb{P}(A \cap B) \geq \mathbb{P}(A\cap D)\geq\mathbb{P}(A|D)/2\geq \mathbb{P}(A)/2$$
So our new aim to lower bound $\mathbb{P}(A)$. When $k$-smoothing $G$, we toggle up to $k$ of the possible ${n \choose 2}$ edges of $G$. If we end up toggling $i$ edges we have
$$\mathbb{P}(\text{not }A | \text{ toggle }i \text{ edges})=\left(1-\frac{|S|}{{n \choose 2}}\right)\left(1-\frac{|S|}{{n \choose 2}-1}\right)...\left(1-\frac{|S|}{{n \choose 2}-(i-1)}\right)\leq \left(1-\frac{|S|}{{n \choose 2}}\right)^i$$
If we knew $\mathbb{P}(\text{toggle } i \text{ edges})$, then we could use the following to finish the proof
$$\mathbb{P}(A)\geq\mathbb{P}(A\cap \text{toggle } i \text{ edges})=(1-\mathbb{P}(\text{not }A | \text{ toggle }i \text{ edges}))\mathbb{P}(\text{toggle } i \text{ edges})$$
By $k$-smoothing $G$, we have $\sum_{i=0}^{k}{{{n \choose 2}\choose i}}$ possible $k$-smoothed graphs $G'$ we could get. Observe that for $k\leq {n\choose 2}/2$ the terms in this summation are increasing. Thus we have that
$$2\sum_{i=\lceil{k/2\rceil}}^{k}{{{n \choose 2}\choose i}}\geq \sum_{i=0}^{k}{{{n \choose 2}\choose i}}$$
This means that at least half the possible graphs $G'$ have $k/2\leq i\leq k$ toggles. Hence $\mathbb{P}(\text{toggle } \geq k/2 \text{ edges})\geq 1/2$.  Hence if we toggle at least $k/2$ edges, we have
$$\mathbb{P}(\text{not }A | \text{ toggle } \geq k/2 \text{ edges})\leq \left(1-\frac{|S|}{{n \choose 2}}\right)^{k/2}\leq 1-\frac{(k/4)|S|}{{n \choose 2}}$$
and so
$$\mathbb{P}(A | \text{ toggle } \geq k/2 \text{ edges}) \geq \frac{(k/4)|S|}{{n \choose 2}}$$
Then 
$$\mathbb{P}(A)\geq\mathbb{P}(A \cap (\text{ toggle } \geq k/2 \text{ edges})\geq (1/8)k|S|/{n \choose 2} \geq (1/8)k|S|/n^2$$
which concludes the proof.

Below is a similar lemma to lemma 1, which states that it is unlikely that a helpful edge $e\in S$ will be removed from $G$ during $k$-smoothing.

Lemma 2: There exists constant $c_2 > 0$ such that the following holds. Consider any graph $G = (V, E)\in\text{allowed}(\mathcal{G}_{\text{conn}})$. Consider also any nonempty set $S \subseteq E$ of edges in the graph and smoothing value $k \leq n/16$. Then with probability at most $c_2k|S| /n^2$, the $k$-smoothed graph $G′$ removes an edge from $S$.

Finally, we have a final lemma which will be useful when arguing for $k$-smoothed lower bounds.

Lemma 3: There exists constant $c_3 > 0$ such that the following holds. Consider any graph $G = (V, E)\in\text{allowed}(\mathcal{G}_{\text{conn}})$. Consider also any set $S$ of edges and smoothing value $k \leq n/16$ such that $S \cap E = \emptyset$. Then with probability at most $c_3k|S|/n^2$ , the $k$-smoothed graph $G'$ of $G$ contains an edge from $S$.

Lemma 1 says that with probability at least $c_1 k |S|/n^2$, $G'$ will contain some edge from $S$, and lemma 3 just says that with probability at most $c_3k |S|/n^2$, $G'$ will contain some edge from $S$. So lemma 1 intuitively implies that it is likely a helpful edge will be added during smoothing, but lemma 3 says that it is not too likely.

[Din+15]    Michael Dinitz et al. “Smoothed Analysis of Dynamic Networks”.
In: Proceedings of the 29th International Symposium on Distributed
Computing - Volume 9363. DISC 2015. Tokyo, Japan: Springer-
Verlag, 2015, pp. 513–527. isbn: 9783662486528. doi: 10.1007/
978- 3- 662- 48653- 5_34. url: https://doi.org/10.1007/978-3-662-48653-5_34.

Thursday, November 4, 2021

CEOI 2020: Fancy fence

Problem statement: 

Everybody knows that Balázs has the fanciest fence in the whole town. It’s built up from $N$ fancy sections. The sections are rectangles standing closely next to each other on the ground. The $i^{\text{th}}$ section has integer height $h_i$ and integer width $w_i$ . We are looking for fancy rectangles on this fancy fence. A rectangle is fancy if:

  • its sides are either horizontal or vertical and have integer lengths
  • the distance between the rectangle and the ground is integer
  • the distance between the rectangle and the left side of the first section is integer
  • it’s lying completely on sections 
What is the number of fancy rectangles? This number can be very big, so we are interested in it modulo $10^9 + 7$.

Input: The first line contains $N$, the number of sections. The second line contains $N$ space-separated integers, the $i^{\text{th}}$ number is $h_i$ . The third line contains $N$ space-separated integers, the $i^{\text{th}}$ number is $w_i$.

Constraints:

$1 \leq N \leq 10^5$

$1 \leq h_i, wi \leq 10^9$

Time limit: 0.1 s

Memory limit: 32 MiB

My approach:

It seems reasonable to first figure out how many fancy rectangles an $h_i\times w_i$ fancy section has. If we draw an $h_i\times w_i$ rectangle as follows.


We see that the pair of points chosen for both sides uniquely determine a fancy rectangle. Thus we have
$$F(h_i, w_i)={h_i + 1 \choose 2}\cdot{w_i + 1 \choose 2}$$
fancy rectangles for an $h_i\times w_i$ fancy section.

When we place fancy sections next to each other in a line, we can have fancy rectangles positioned across two or more fancy sections. Thus we have to count these too. Consider the following example input where all $h_i<h_{i+1}$.


Each square is $1\times 1$ and the purple lines indicate the separation between consecutive fancy sections. Let $W=w_1+w_2+...+w_N$. Notice $F(h_2, W-w_1)$ counts all fancy rectangles counted by $F(h_1, W)$ except the rectangles that start in the first fancy section. We use this to get an answer for when $h_i<h_{i+1}$.

$$\sum_{i=1}^{N}{F(h_i, W-w_{i-1})-F(h_{i-1}, W-w_{i-1})}$$

where $w_0=h_0=0$. But what about the general case where we are not given that $h_i<h_{i+1}$?

AtCoder shananigans

Here is the problem , and below is the raw transcript of my thoughts. Obs: It is always optimal to use all $K$ moves. intuitive proof. While...