Interval Sum Counting Algorithm

Category: Algorithms

Problem Set 4

Problem 1

$I = [(0, 1), (0, 3), (4, 5), (2, 5)]$

These intervals are already sorted by finish time.

  1. Initially, there are no classrooms allocated, so $I_0$ is placed in $C_0$.
  2. Next, $I_1$ is incompatible with $I_0$, so it is placed in a newly allocated $C_1$.
  3. $I_2$ is compatible with $C_0$ and $C_1$, so it is placed in $C_0$.
  4. $I_3$ isn't compatible with $C_0$ or $C_1$ (because of $I_2$ and $I_1$ respectively), so it is placed in a newly allocated $C_2$

So we have...

  • $C_0 = { I_0, I_2 }$
  • $C_1 = { I_1 }$
  • $C_2 = { I_3 }$

As you can see below, the maximum depth is 2, so this algorithm allocated more than max-depth classrooms.


I_1  +-----------------+     +-----+  I_2
I_0  +-----+     +-----------------+  I_3

     +-----+-----+-----+-----+-----+
     0     1     2     3     4     5

Problem 2

Algorithm

  • Let $S$ be the (initially empty) set of pairs
  • Sort $A = a_1, a_2, \ldots, a_n$ in increasing order to get $B = b_1, b_2, \ldots, b_n$.
  • While there are elements remaining, pick the smallest and largest element (the first and last $b_i, b_j$), and remove them from $B$.
  • Add ${b_i, b_j}$ to $S$
  • Once $B$ is empty, return $S$.
# O(nlog(n))
def min_max_pairs(A):
  A.sort()

  l, r = 0, len(A) - 1
  res = -infinity
  while l < r:
    res = max(res, A[l] + A[r])
    l += 1
    r -= 1
  return res

Running Time

Sorting takes $O(n\log(n))$. Then, we perform $\frac{n}{2}$ iterations of our while loop, each requiring constant work. This gives us an overall runtime of $O(n\log(n))$

Correctness: Greedy Stays Ahead

For notational convenience, let $a_1, a_2, \ldots, a_n$ be sorted. Let $G$ be my algorithm, and $X$ be the optimum, and for the sake of contradiction, suppose $X$ chooses pairs differently from $G$, e.g. not of the form $(a_i, a_{n - i})$ as $G$ does.

Let $g(a_i)$ be a function that outputs the "choice" of partner for any $a_i$ by $G$, so $g(a_i) = a_{n - i}$, and let $x(a_i)$ mean the same for $X$.

Define $P(n)$ as...

Given $n$ sorted numbers (with $n$ being even) $a_1, \ldots a_n$, $G_{max} \le X_{max}$, where $G_{max}$ and $X_{max}$ are defined as follows:

$$ G_{max} = max({ g(a_i) + a_i : 1 \le i \le n }) $$

$$ X_{max} = max({ x(a_i) + a_i : 1 \le i \le n }) $$

Base Case:

$P(2)$: we have $a_1, a_2$, with $a_1 < a_2$. There is only one possible pair, so $G_{max} \le X_{max}$, and $P(2)$ holds.

IH: Suppose $P(k - 2)$ holds.

IS:

Let $A = a_1, a_2, \ldots, a_{k - 1}, a_k$ be sorted numbers. Remove $a_1$ and $a_k$ to get $A' = a_2, \ldots, a_{k - 1}$. Since $A'$ is still sorted, and has $|A'| = k - 2$, we know $P(k - 2)$ holds.

Therefore, we have $G'{max} \le X'$ for $A'$.

Consider an arbitrary pair chosen by $G$ on $A'$, $(a_i, a_j)$, with $a_i \le a_j$. By our sort order, we have $a_1 \le a_i \le a_j \le a_k$. Now, consider the ways we could swap elements between these two pairs

Case 1: $a_1 + a_k > a_i + a_j$

  • $(a_1, a_j), (a_i, a_k)$
  • $a_i + a_k \ge a_1 + a_k$, since $a_i \ge a_1$
  • $(a_1, a_i), (a_k, a_j)$
  • $a_k + a_j \ge a_1 + a_k$, since $a_j \ge a_i$

Case 2: $a_1 + a_k \le a_i + a_j$

  • $(a_1, a_j), (a_i, a_k)$
  • $a_i + a_k \ge a_i + a_j$, since $a_k \ge a_j$
  • $(a_1, a_i), (a_k, a_j)$
  • $a_k + a_j \ge a_i + a_j$, since $a_k \ge a_i$

In either of the above cases, we are increasing the max of the two sums by swapping elements, and this holds for an arbitrary $a_i, a_j$ chosen by $G'$, which is at least as good as the optimum. Since we know $X'{max} \ge G'$, and so $P(k)$ holds.}$, and also that any pairing other than the one picked by $G$ ($a_1, a_k$) leads to an increased sum, it must be the case that $G_{max} \le X_{max

Problem 3

Lemma (1): For any two trees $T_1$ and $T_2$ with $v \in T_1$ and $u \in T_2$, if we add the edge $(u, v)$ between the two trees, we get a new tree $T_3$.

Proof:

Let $n$ be the number of vertices in $T_1$, and $m$ be the number of vertices in $T_2$. Since they are both trees, we have $n - 1$ edges in $T_1$, and $m - 1$ edges in $T_2$. Connecting $T_1$ and $T_2$ on $(u, v)$ then results in a graph with $m + n$ vertices, and $m - 1 + n - 1 = m + n - 1$ edges.

Furthermore, both $T_1$ and $T_2$ are connected. Therefore, after adding an the edge $(u, v)$ between them, this results in a graph that is also connected, since all vertices in $T_1$ can reach all vertices in $T_2$ through this edge, and vice versa.

Finally, since the resulting graph is connected and has $m + n - 1$ edges, it must be a tree.

Lemma (2): For any tree $T$ removing an edge $e = (u, v)$ results in two trees.

Proof:

Let $T$ be an arbitrary tree and $e = (u, v)$ be an arbitrary edge in $T$.

Removing the edge $e$ from $T$ results in two separate connected components, each containing a subset of the vertices from $T$.

The resulting components have no cycles, since $T$ was a tree, and is therefore acyclic, and removing an edge cannot create a cycle.

Each component is connected, as every pair of vertices in $T$ was connected by a unique path (since $T$ is a tree), and removing $e$ does not disconnect any other pairs of vertices within each component.

Therefore, the two components formed by cutting $T$ on $e$ are both trees.

Lemma (3): For any partition of vertices of a tree $T$ into two sets $A$ and $B$, there must be an edge ($u, v$) between some vertex $u \in A$ and $v \in B$.

Proof:

Let $T$ be a tree and ${ A, B } be a partition of vertices in $T$. Suppose for the sake of contradiction that there was no edge between vertices in $A$ and $B$.

Let $x \in A$ and $y \in B$ be arbitrary vertices. Since $T$ is a tree, it is connected. Therefore, there must exist a path $x \to y$, so at some point along this path a vertex in $A$ must be connected to a vertex in $B$, which is a contradiction.

Main Proof:

Let $T_1$ and $T_2$ be edge disjoint spanning trees over $G$. Consider an arbitrary edge $e = (u, v) \in T_1$.

Let $T_1' = T_1 - e$. Since $T_1$ was a tree, this splits $T_1'$ into two connected components $C_1, C_2$, both of which are also trees by (2). We have $u \in C_1$ and $v \in C_2$.

Since $T_2$ is also a tree, and is therefore connected, by (3), there must exist an edge $f = (x, y) \in T_2$, such that $x \in C_1$ and $y \in C_2$. For any two vertices in a tree, there is exactly one path between them, since otherwise there would either be a non-connected vertex (no path), or a cycle (more than one path). We know $(u, v) \notin T_2$, since $T_1$ and $T_2$ are edge-disjoint, so letting $u, p_1, p_2, \ldots, v$ be the path between $u$ and $v$ in $T_2$, choose $f = (u, p_1)$ (where $p_1$ may or may not be $v$).

Cutting $T_2$ on $f$ to get $T_2'$, we have two connected components $K_1$, $K_2$, both of which are trees. Now, we can add $e$ to $T_2'$, and by (1), we get a tree, since $K_1$ and $K_2$ are both trees, with $u \in K_1$ and $v \in K_2$.

Similarly, adding $f = (x, y)$ to $T_1'$, we also get a tree by (1), since $C_1$ is a tree with $x \in C_1$, and $C_2$ is a tree with $y \in C_2$.

Therefore, we have both $T_1 - e + f$ and $T_2 - f + e$ are trees.

Problem 5

Call my algorithm $S(A, l, u)$.

Correctness

Claim: Given a list of $n$ numbers $A = a_1, a_n, \ldots a_n$, the number of interval sums in $\lbrack l, u \rbrack$ is correctly calculated by $S(A, l, u) = R + L + C$, where they are defined as...

$$L = S([a_1, \ldots, a_{\lceil n/2 \rceil - 1}], l, u)$$

i.e. the number of interval sums in the first half of $A$ $\lbrack l, u \rbrack$.

$$R = S([a_{\lceil n/2 \rceil}, \ldots, a_n], l, u)$$

i.e. the number of interval sums in the latter half of $A$ in $\lbrack l, u \rbrack$.

$$C = | { (i, j) : i \le \lceil n/2 \rceil - 1 \land j \ge \lceil n/2 \rceil \land I(i, j) \in [l, u] } |$$

i.e the number of interval sums that cross over the midpoint of $A$ and are in $\lbrack l, u \rbrack$

Proof: By induction

Let $P(n)$ be the above claim, i.e. that given $n$ numbers $A$ and a range $\lbrack l, u \rbrack$, my algorithm returns the number of interval sums on $A$ that are between $\lbrack l, u \rbrack$ via the following calculation:

$$S(A, l, u) = L + R + C$$

Base Case:

$P(1)$

Given a single number, there is only one interval sum to consider, i.e. $I(1, 1)$. In this case, the left half of $A$ is empty, and the right half consists of a single element. There are no interval sums that cross over the midpoint, so if $l \le I(1, 1) \le u$, then $L = 1$, and otherwise, $L = 0$. This gives us a final answer of $S(A, l, u) = L + R + C = 0 + R + 0 = R$. There is only one possible interval to consider, and $R$ is set depending on whether it is in $\lbrack l, u \rbrack$, so this is correct.

$P(2)$

Given two numbers $A = a_1, a_2$, we have $3$ intervals to check corresponding to $I(1, 1)$, $I(2, 2)$, and $I(1, 2)$. We have that $L = 1$ if $I(1, 1) \in [l, u]$, and $0$ otherwise, $R = 1$ if $I(2, 2) \in [l, u]$, and $0$ otherwise, and $C = 1$ if $I(1, 2) \in [l, u]$, and $0$ otherwise. Therefore, we check all possible interval sums, and output the sum of $L, R, C$, which corresponds to the number of these interval sums in $\lbrack l, u \rbrack$.

I.H.: Suppose $P(1) \land \ldots \land P(\lceil n/2 \rceil) \land \ldots \land P(n - 1)$

I.S.:

Let $A = a_1, a_2, \ldots, a_n$, and $l, u$ be arbitrary numbers such that $l < u$. Let $m = \lceil n/2 \rceil$, and define $A_1 = a_1, \ldots, a_{m - 1}$, and $A_2 = a_m, \ldots, a_n$.

Since $|A_1| \le n - 1$, and $|A_2| \le n - 1$, we have that $P(|A_1|)$ and $P(|A_2|)$ holds. Thus, $L = S(A_1, l, u)$ and $R = S(A_2, l, u)$, which are the total number of interval sums in $A_1$ and $A_2$ respectively.

Considering the set $U \subset [n]^2$ of every possible interval over $A$, we can partition $U$ into two sets, ones that don't cross over the midpoint, $M^c$, and those that do cross over the midpoint, $M$. Note that $U = M \cup M^c$, and $M \cap M^c = \emptyset$.

By our inductive hypothesis, we have...

$$L + R = |{ (i, j) : (i, j) \in M^c \land I(i, j) \in [l, u]}|$$

since we considered precisely the intervals in each side of $A$, which are exactly the intervals that don't cross the midpoint, and counted the number of such intervals that have $I(i, j) \in [l, u]$.

To find $|{ (i, j) : (i, j) \in M \land I(i, j) \in [l, u]}| = C$, we must check each suitable pair $(i, j)$ that has $i < m \le j$. We can first consider each $I(m, j)$ for $j \ge m$, and then find all such $I(i, m - 1)$ for $i \le m - 1$ such that $I(m, j) + I(i, m - 1) \in [l, u]$. Note that we don't need to consider all elements of $M$, since we can skip those that don't have $I(i, j) \in [l, u]$.

My algorithm does this by first calculating each $I(i, m - 1)$ and sorting them into a list $H$. Then, it iterates through all $I(m, j)$ for $m \le j \le n$, and searches for the lowest and highest index $i_{min}, i_{max}$ in $H$ such that $H_i + I(m, j) \in [l, u]$. We then calculate $C$ by maintaining a running sum of $i_{max} - i_{min} + 1$, ie the number of suitable $i$'s for a given $j$. This is guaranteed to consider each relevant $i < m \le j$, since $H$ is sorted, and so all indices between and including $\lbrack i_{min}, i_{max} \rbrack$ would also be valid prefixes to the right half of the intervals we are considering, and those outside of $\lbrack i_{min}, i_{max} \rbrack$ would correspond to an interval sum that is outside $\lbrack l, u \rbrack$.

Therefore, we have...

$$C = |{ (i, j) : (i, j) \in M \land I(i, j) \in [l, u]}|$$

And so $S(A, l, u) = L + R + C$ correctly considers all relevant interval sums, counting the number of interval sums between $\lbrack l, u \rbrack$.

def count_interval_sums_bruteforce(A, I, l, u, low, high):
  ans = 0
  for i in range(low, high):
    for j in range(i, high):
      s = I(i, j)
      if s <= u and s >= l:
        ans += 1
  return ans

def count_interval_sums_recursive(A, I, l, u, low, high):
  if high - low <= 2:
    return count_interval_sums_bruteforce(A, I, l, u, low, high)

  mid = low + (high - low)//2
  left = count_interval_sums_recursive(A, I, l, u, low, mid)
  right = count_interval_sums_recursive(A, I, l, u, mid, high)

  total_sum = sum(A[low:high])
  left_sum = sum(A[low:mid])
  right_sum = total_sum - left_sum

  prefix = prefix_sum(A[low:high])
  suffix = suffix_sum(A[low:high])

  possible_left_halfs = sorted(
      [total - prefix[i] - right_sum for i, s in enumerate(suffix[:mid])]
  )

  cross = 0

  for i in range(len(prefix)//2, len(prefix)):
    right_half = right_sum - suffix[i]

    min_left_half = l - right_half
    max_left_half = u - right_half

    j = possible_left_halfs.binary_search(min_left_half)
    k = possible_left_halfs.binary_search(max_left_half)

    cross += k - j + 1

  return left + right + cross


def count_interval_sums(A, l, u):
  n = len(A)
  prefix, suffix = [0] * (n+1), [0] * (n+1)
  pre = 0
  total = sum(A)
  for i in range(n):
    prefix[i] = pre
    pre += A[i]
    suffix[i] = total - pre

  def I(i, j):
    return total - prefix[i] - suffix[j]

  return count_interval_sums_recursive(A, I, l, u, 0, len(A))