Interval Sum Counting Algorithm
Problem Set 4
Problem 1
$I = [(0, 1), (0, 3), (4, 5), (2, 5)]$
These intervals are already sorted by finish time.
- Initially, there are no classrooms allocated, so $I_0$ is placed in $C_0$.
- Next, $I_1$ is incompatible with $I_0$, so it is placed in a newly allocated $C_1$.
- $I_2$ is compatible with $C_0$ and $C_1$, so it is placed in $C_0$.
- $I_3$ isn't compatible with $C_0$ or $C_1$ (because of $I_2$ and $I_1$ respectively), so it is placed in a newly allocated $C_2$
So we have...
- $C_0 = { I_0, I_2 }$
- $C_1 = { I_1 }$
- $C_2 = { I_3 }$
As you can see below, the maximum depth is 2, so this algorithm allocated more than max-depth classrooms.
I_1 +-----------------+ +-----+ I_2
I_0 +-----+ +-----------------+ I_3
+-----+-----+-----+-----+-----+
0 1 2 3 4 5
Problem 2
Algorithm
- Let $S$ be the (initially empty) set of pairs
- Sort $A = a_1, a_2, \ldots, a_n$ in increasing order to get $B = b_1, b_2, \ldots, b_n$.
- While there are elements remaining, pick the smallest and largest element (the first and last $b_i, b_j$), and remove them from $B$.
- Add ${b_i, b_j}$ to $S$
- Once $B$ is empty, return $S$.
# O(nlog(n))
def min_max_pairs(A):
A.sort()
l, r = 0, len(A) - 1
res = -infinity
while l < r:
res = max(res, A[l] + A[r])
l += 1
r -= 1
return res
Running Time
Sorting takes $O(n\log(n))$. Then, we perform $\frac{n}{2}$ iterations of our while loop, each requiring constant work. This gives us an overall runtime of $O(n\log(n))$
Correctness: Greedy Stays Ahead
For notational convenience, let $a_1, a_2, \ldots, a_n$ be sorted. Let $G$ be my algorithm, and $X$ be the optimum, and for the sake of contradiction, suppose $X$ chooses pairs differently from $G$, e.g. not of the form $(a_i, a_{n - i})$ as $G$ does.
Let $g(a_i)$ be a function that outputs the "choice" of partner for any $a_i$ by $G$, so $g(a_i) = a_{n - i}$, and let $x(a_i)$ mean the same for $X$.
Define $P(n)$ as...
Given $n$ sorted numbers (with $n$ being even) $a_1, \ldots a_n$, $G_{max} \le X_{max}$, where $G_{max}$ and $X_{max}$ are defined as follows:
$$ G_{max} = max({ g(a_i) + a_i : 1 \le i \le n }) $$
$$ X_{max} = max({ x(a_i) + a_i : 1 \le i \le n }) $$
Base Case:
$P(2)$: we have $a_1, a_2$, with $a_1 < a_2$. There is only one possible pair, so $G_{max} \le X_{max}$, and $P(2)$ holds.
IH: Suppose $P(k - 2)$ holds.
IS:
Let $A = a_1, a_2, \ldots, a_{k - 1}, a_k$ be sorted numbers. Remove $a_1$ and $a_k$ to get $A' = a_2, \ldots, a_{k - 1}$. Since $A'$ is still sorted, and has $|A'| = k - 2$, we know $P(k - 2)$ holds.
Therefore, we have $G'{max} \le X'$ for $A'$.
Consider an arbitrary pair chosen by $G$ on $A'$, $(a_i, a_j)$, with $a_i \le a_j$. By our sort order, we have $a_1 \le a_i \le a_j \le a_k$. Now, consider the ways we could swap elements between these two pairs
Case 1: $a_1 + a_k > a_i + a_j$
- $(a_1, a_j), (a_i, a_k)$
- $a_i + a_k \ge a_1 + a_k$, since $a_i \ge a_1$
- $(a_1, a_i), (a_k, a_j)$
- $a_k + a_j \ge a_1 + a_k$, since $a_j \ge a_i$
Case 2: $a_1 + a_k \le a_i + a_j$
- $(a_1, a_j), (a_i, a_k)$
- $a_i + a_k \ge a_i + a_j$, since $a_k \ge a_j$
- $(a_1, a_i), (a_k, a_j)$
- $a_k + a_j \ge a_i + a_j$, since $a_k \ge a_i$
In either of the above cases, we are increasing the max of the two sums by swapping elements, and this holds for an arbitrary $a_i, a_j$ chosen by $G'$, which is at least as good as the optimum. Since we know $X'{max} \ge G'$, and so $P(k)$ holds.}$, and also that any pairing other than the one picked by $G$ ($a_1, a_k$) leads to an increased sum, it must be the case that $G_{max} \le X_{max
Problem 3
Lemma (1): For any two trees $T_1$ and $T_2$ with $v \in T_1$ and $u \in T_2$, if we add the edge $(u, v)$ between the two trees, we get a new tree $T_3$.
Proof:
Let $n$ be the number of vertices in $T_1$, and $m$ be the number of vertices in $T_2$. Since they are both trees, we have $n - 1$ edges in $T_1$, and $m - 1$ edges in $T_2$. Connecting $T_1$ and $T_2$ on $(u, v)$ then results in a graph with $m + n$ vertices, and $m - 1 + n - 1 = m + n - 1$ edges.
Furthermore, both $T_1$ and $T_2$ are connected. Therefore, after adding an the edge $(u, v)$ between them, this results in a graph that is also connected, since all vertices in $T_1$ can reach all vertices in $T_2$ through this edge, and vice versa.
Finally, since the resulting graph is connected and has $m + n - 1$ edges, it must be a tree.
Lemma (2): For any tree $T$ removing an edge $e = (u, v)$ results in two trees.
Proof:
Let $T$ be an arbitrary tree and $e = (u, v)$ be an arbitrary edge in $T$.
Removing the edge $e$ from $T$ results in two separate connected components, each containing a subset of the vertices from $T$.
The resulting components have no cycles, since $T$ was a tree, and is therefore acyclic, and removing an edge cannot create a cycle.
Each component is connected, as every pair of vertices in $T$ was connected by a unique path (since $T$ is a tree), and removing $e$ does not disconnect any other pairs of vertices within each component.
Therefore, the two components formed by cutting $T$ on $e$ are both trees.
Lemma (3): For any partition of vertices of a tree $T$ into two sets $A$ and $B$, there must be an edge ($u, v$) between some vertex $u \in A$ and $v \in B$.
Proof:
Let $T$ be a tree and ${ A, B } be a partition of vertices in $T$. Suppose for the sake of contradiction that there was no edge between vertices in $A$ and $B$.
Let $x \in A$ and $y \in B$ be arbitrary vertices. Since $T$ is a tree, it is connected. Therefore, there must exist a path $x \to y$, so at some point along this path a vertex in $A$ must be connected to a vertex in $B$, which is a contradiction.
Main Proof:
Let $T_1$ and $T_2$ be edge disjoint spanning trees over $G$. Consider an arbitrary edge $e = (u, v) \in T_1$.
Let $T_1' = T_1 - e$. Since $T_1$ was a tree, this splits $T_1'$ into two connected components $C_1, C_2$, both of which are also trees by (2). We have $u \in C_1$ and $v \in C_2$.
Since $T_2$ is also a tree, and is therefore connected, by (3), there must exist an edge $f = (x, y) \in T_2$, such that $x \in C_1$ and $y \in C_2$. For any two vertices in a tree, there is exactly one path between them, since otherwise there would either be a non-connected vertex (no path), or a cycle (more than one path). We know $(u, v) \notin T_2$, since $T_1$ and $T_2$ are edge-disjoint, so letting $u, p_1, p_2, \ldots, v$ be the path between $u$ and $v$ in $T_2$, choose $f = (u, p_1)$ (where $p_1$ may or may not be $v$).
Cutting $T_2$ on $f$ to get $T_2'$, we have two connected components $K_1$, $K_2$, both of which are trees. Now, we can add $e$ to $T_2'$, and by (1), we get a tree, since $K_1$ and $K_2$ are both trees, with $u \in K_1$ and $v \in K_2$.
Similarly, adding $f = (x, y)$ to $T_1'$, we also get a tree by (1), since $C_1$ is a tree with $x \in C_1$, and $C_2$ is a tree with $y \in C_2$.
Therefore, we have both $T_1 - e + f$ and $T_2 - f + e$ are trees.
Problem 5
Call my algorithm $S(A, l, u)$.
Correctness
Claim: Given a list of $n$ numbers $A = a_1, a_n, \ldots a_n$, the number of interval sums in $\lbrack l, u \rbrack$ is correctly calculated by $S(A, l, u) = R + L + C$, where they are defined as...
$$L = S([a_1, \ldots, a_{\lceil n/2 \rceil - 1}], l, u)$$
i.e. the number of interval sums in the first half of $A$ $\lbrack l, u \rbrack$.
$$R = S([a_{\lceil n/2 \rceil}, \ldots, a_n], l, u)$$
i.e. the number of interval sums in the latter half of $A$ in $\lbrack l, u \rbrack$.
$$C = | { (i, j) : i \le \lceil n/2 \rceil - 1 \land j \ge \lceil n/2 \rceil \land I(i, j) \in [l, u] } |$$
i.e the number of interval sums that cross over the midpoint of $A$ and are in $\lbrack l, u \rbrack$
Proof: By induction
Let $P(n)$ be the above claim, i.e. that given $n$ numbers $A$ and a range $\lbrack l, u \rbrack$, my algorithm returns the number of interval sums on $A$ that are between $\lbrack l, u \rbrack$ via the following calculation:
$$S(A, l, u) = L + R + C$$
Base Case:
$P(1)$
Given a single number, there is only one interval sum to consider, i.e. $I(1, 1)$. In this case, the left half of $A$ is empty, and the right half consists of a single element. There are no interval sums that cross over the midpoint, so if $l \le I(1, 1) \le u$, then $L = 1$, and otherwise, $L = 0$. This gives us a final answer of $S(A, l, u) = L + R + C = 0 + R + 0 = R$. There is only one possible interval to consider, and $R$ is set depending on whether it is in $\lbrack l, u \rbrack$, so this is correct.
$P(2)$
Given two numbers $A = a_1, a_2$, we have $3$ intervals to check corresponding to $I(1, 1)$, $I(2, 2)$, and $I(1, 2)$. We have that $L = 1$ if $I(1, 1) \in [l, u]$, and $0$ otherwise, $R = 1$ if $I(2, 2) \in [l, u]$, and $0$ otherwise, and $C = 1$ if $I(1, 2) \in [l, u]$, and $0$ otherwise. Therefore, we check all possible interval sums, and output the sum of $L, R, C$, which corresponds to the number of these interval sums in $\lbrack l, u \rbrack$.
I.H.: Suppose $P(1) \land \ldots \land P(\lceil n/2 \rceil) \land \ldots \land P(n - 1)$
I.S.:
Let $A = a_1, a_2, \ldots, a_n$, and $l, u$ be arbitrary numbers such that $l < u$. Let $m = \lceil n/2 \rceil$, and define $A_1 = a_1, \ldots, a_{m - 1}$, and $A_2 = a_m, \ldots, a_n$.
Since $|A_1| \le n - 1$, and $|A_2| \le n - 1$, we have that $P(|A_1|)$ and $P(|A_2|)$ holds. Thus, $L = S(A_1, l, u)$ and $R = S(A_2, l, u)$, which are the total number of interval sums in $A_1$ and $A_2$ respectively.
Considering the set $U \subset [n]^2$ of every possible interval over $A$, we can partition $U$ into two sets, ones that don't cross over the midpoint, $M^c$, and those that do cross over the midpoint, $M$. Note that $U = M \cup M^c$, and $M \cap M^c = \emptyset$.
By our inductive hypothesis, we have...
$$L + R = |{ (i, j) : (i, j) \in M^c \land I(i, j) \in [l, u]}|$$
since we considered precisely the intervals in each side of $A$, which are exactly the intervals that don't cross the midpoint, and counted the number of such intervals that have $I(i, j) \in [l, u]$.
To find $|{ (i, j) : (i, j) \in M \land I(i, j) \in [l, u]}| = C$, we must check each suitable pair $(i, j)$ that has $i < m \le j$. We can first consider each $I(m, j)$ for $j \ge m$, and then find all such $I(i, m - 1)$ for $i \le m - 1$ such that $I(m, j) + I(i, m - 1) \in [l, u]$. Note that we don't need to consider all elements of $M$, since we can skip those that don't have $I(i, j) \in [l, u]$.
My algorithm does this by first calculating each $I(i, m - 1)$ and sorting them into a list $H$. Then, it iterates through all $I(m, j)$ for $m \le j \le n$, and searches for the lowest and highest index $i_{min}, i_{max}$ in $H$ such that $H_i + I(m, j) \in [l, u]$. We then calculate $C$ by maintaining a running sum of $i_{max} - i_{min} + 1$, ie the number of suitable $i$'s for a given $j$. This is guaranteed to consider each relevant $i < m \le j$, since $H$ is sorted, and so all indices between and including $\lbrack i_{min}, i_{max} \rbrack$ would also be valid prefixes to the right half of the intervals we are considering, and those outside of $\lbrack i_{min}, i_{max} \rbrack$ would correspond to an interval sum that is outside $\lbrack l, u \rbrack$.
Therefore, we have...
$$C = |{ (i, j) : (i, j) \in M \land I(i, j) \in [l, u]}|$$
And so $S(A, l, u) = L + R + C$ correctly considers all relevant interval sums, counting the number of interval sums between $\lbrack l, u \rbrack$.
def count_interval_sums_bruteforce(A, I, l, u, low, high):
ans = 0
for i in range(low, high):
for j in range(i, high):
s = I(i, j)
if s <= u and s >= l:
ans += 1
return ans
def count_interval_sums_recursive(A, I, l, u, low, high):
if high - low <= 2:
return count_interval_sums_bruteforce(A, I, l, u, low, high)
mid = low + (high - low)//2
left = count_interval_sums_recursive(A, I, l, u, low, mid)
right = count_interval_sums_recursive(A, I, l, u, mid, high)
total_sum = sum(A[low:high])
left_sum = sum(A[low:mid])
right_sum = total_sum - left_sum
prefix = prefix_sum(A[low:high])
suffix = suffix_sum(A[low:high])
possible_left_halfs = sorted(
[total - prefix[i] - right_sum for i, s in enumerate(suffix[:mid])]
)
cross = 0
for i in range(len(prefix)//2, len(prefix)):
right_half = right_sum - suffix[i]
min_left_half = l - right_half
max_left_half = u - right_half
j = possible_left_halfs.binary_search(min_left_half)
k = possible_left_halfs.binary_search(max_left_half)
cross += k - j + 1
return left + right + cross
def count_interval_sums(A, l, u):
n = len(A)
prefix, suffix = [0] * (n+1), [0] * (n+1)
pre = 0
total = sum(A)
for i in range(n):
prefix[i] = pre
pre += A[i]
suffix[i] = total - pre
def I(i, j):
return total - prefix[i] - suffix[j]
return count_interval_sums_recursive(A, I, l, u, 0, len(A))