Interval Sum Counting Algorithm

Last modified: 2025-02-11 Category: Algorithms ..

Problem Set 4

Problem 1

$I = $(0, 1), (0, 3), (4, 5), (2, 5)$ $

These intervals are already sorted by finish time.

Initially, there are no classrooms allocated, so $I_0$ is placed in $C_0$ .
Next, $I_1$ is incompatible with $I_0$ , so it is placed in a newly allocated $C_1$ .
$I_2$ is compatible with $C_0$ and $C_1$ , so it is placed in $C_0$ .
$I_3$ isn't compatible with $C_0$ or $C_1$ (because of $I_2$ and $I_1$ respectively), so it is placed in a newly allocated $C_2$

So we have...

$C_0 = { I_0, I_2 }$
$C_1 = { I_1 }$
$C_2 = { I_3 }$

As you can see below, the maximum depth is 2, so this algorithm allocated more than max-depth classrooms.


I_1  +-----------------+     +-----+  I_2
I_0  +-----+     +-----------------+  I_3

     +-----+-----+-----+-----+-----+
     0     1     2     3     4     5

Problem 2

Algorithm

Let $S$ be the (initially empty) set of pairs
Sort $A = a_1, a_2, \ldots, a_n$ in increasing order to get $B = b_1, b_2, \ldots, b_n$ .
While there are elements remaining, pick the smallest and largest element (the first and last $b_i, b_j$ ), and remove them from $B$ .
Add ${b_i, b_j}$ to $S$
Once $B$ is empty, return $S$ .

# O(nlog(n))
def min_max_pairs(A):
  A.sort()

  l, r = 0, len(A) - 1
  res = -infinity
  while l < r:
    res = max(res, A[l] + A[r])
    l += 1
    r -= 1
  return res

Running Time

Sorting takes $O(n\log(n))$ . Then, we perform $\frac{n}{2}$ iterations of our while loop, each requiring constant work. This gives us an overall runtime of $O(n\log(n))$

Correctness: Greedy Stays Ahead

For notational convenience, let $a_1, a_2, \ldots, a_n$ be sorted. Let $G$ be my algorithm, and $X$ be the optimum, and for the sake of contradiction, suppose $X$ chooses pairs differently from $G$ , e.g. not of the form $(a_i, a_{n - i})$ as $G$ does.

Let $g(a_i)$ be a function that outputs the "choice" of partner for any $a_i$ by $G$ , so $g(a_i) = a_{n - i}$ , and let $x(a_i)$ mean the same for $X$ .

Define $P(n)$ as...

Given $n$ sorted numbers (with $n$ being even) $a_1, \ldots a_n$ , $G_{max} \le X_{max}$ , where $G_{max}$ and $X_{max}$ are defined as follows:

$G_{max} = max({ g(a_i) + a_i : 1 \le i \le n })$

$X_{max} = max({ x(a_i) + a_i : 1 \le i \le n })$

Base Case:

$P(2)$ : we have $a_1, a_2$ , with $a_1 < a_2$ . There is only one possible pair, so $G_{max} \le X_{max}$ , and $P(2)$ holds.

IH: Suppose $P(k - 2)$ holds.

IS:

Let $A = a_1, a_2, \ldots, a_{k - 1}, a_k$ be sorted numbers. Remove $a_1$ and $a_k$ to get $A' = a_2, \ldots, a_{k - 1}$ . Since $A'$ is still sorted, and has $|A'| = k - 2$ , we know $P(k - 2)$ holds.

Therefore, we have $G'{max} \le X' $for$ A'$.

Consider an arbitrary pair chosen by $G$ on $A'$ , $(a_i, a_j)$ , with $a_i \le a_j$ . By our sort order, we have $a_1 \le a_i \le a_j \le a_k$ . Now, consider the ways we could swap elements between these two pairs

Case 1: $a_1 + a_k > a_i + a_j$

$(a_1, a_j), (a_i, a_k)$
- $a_i + a_k \ge a_1 + a_k$ , since $a_i \ge a_1$
$(a_1, a_i), (a_k, a_j)$
- $a_k + a_j \ge a_1 + a_k$ , since $a_j \ge a_i$

Case 2: $a_1 + a_k \le a_i + a_j$

$(a_1, a_j), (a_i, a_k)$
- $a_i + a_k \ge a_i + a_j$ , since $a_k \ge a_j$
$(a_1, a_i), (a_k, a_j)$
- $a_k + a_j \ge a_i + a_j$ , since $a_k \ge a_i$

In either of the above cases, we are increasing the max of the two sums by swapping elements, and this holds for an arbitrary $a_i, a_j$ chosen by $G'$ , which is at least as good as the optimum. Since we know $X'{max} \ge G' $, and so$ P(k)$ holds.}$, and also that any pairing other than the one picked by $G$ ( $a_1, a_k$ ) leads to an increased sum, it must be the case that $G_{max} \le X_{max

Problem 3

Lemma (1): For any two trees $T_1$ and $T_2$ with $v \in T_1$ and $u \in T_2$ , if we add the edge $(u, v)$ between the two trees, we get a new tree $T_3$ .

Proof:

Let $n$ be the number of vertices in $T_1$ , and $m$ be the number of vertices in $T_2$ . Since they are both trees, we have $n - 1$ edges in $T_1$ , and $m - 1$ edges in $T_2$ . Connecting $T_1$ and $T_2$ on $(u, v)$ then results in a graph with $m + n$ vertices, and $m - 1 + n - 1 = m + n - 1$ edges.

Furthermore, both $T_1$ and $T_2$ are connected. Therefore, after adding an the edge $(u, v)$ between them, this results in a graph that is also connected, since all vertices in $T_1$ can reach all vertices in $T_2$ through this edge, and vice versa.

Finally, since the resulting graph is connected and has $m + n - 1$ edges, it must be a tree.

Lemma (2): For any tree $T$ removing an edge $e = (u, v)$ results in two trees.

Proof:

Let $T$ be an arbitrary tree and $e = (u, v)$ be an arbitrary edge in $T$ .

Removing the edge $e$ from $T$ results in two separate connected components, each containing a subset of the vertices from $T$ .

The resulting components have no cycles, since $T$ was a tree, and is therefore acyclic, and removing an edge cannot create a cycle.

Each component is connected, as every pair of vertices in $T$ was connected by a unique path (since $T$ is a tree), and removing $e$ does not disconnect any other pairs of vertices within each component.

Therefore, the two components formed by cutting $T$ on $e$ are both trees.

Lemma (3): For any partition of vertices of a tree $T$ into two sets $A$ and $B$ , there must be an edge ( $u, v$ ) between some vertex $u \in A$ and $v \in B$ .

Proof:

Let $T$ be a tree and ${ A, B } be a partition of vertices in$ T $. Suppose for the sake of contradiction that there was no edge between vertices in$ A $and$ B$.

Let $x \in A$ and $y \in B$ be arbitrary vertices. Since $T$ is a tree, it is connected. Therefore, there must exist a path $x \to y$ , so at some point along this path a vertex in $A$ must be connected to a vertex in $B$ , which is a contradiction.

Main Proof:

Let $T_1$ and $T_2$ be edge disjoint spanning trees over $G$ . Consider an arbitrary edge $e = (u, v) \in T_1$ .

Let $T_1' = T_1 - e$ . Since $T_1$ was a tree, this splits $T_1'$ into two connected components $C_1, C_2$ , both of which are also trees by (2). We have $u \in C_1$ and $v \in C_2$ .

Since $T_2$ is also a tree, and is therefore connected, by (3), there must exist an edge $f = (x, y) \in T_2$ , such that $x \in C_1$ and $y \in C_2$ . For any two vertices in a tree, there is exactly one path between them, since otherwise there would either be a non-connected vertex (no path), or a cycle (more than one path). We know $(u, v) \notin T_2$ , since $T_1$ and $T_2$ are edge-disjoint, so letting $u, p_1, p_2, \ldots, v$ be the path between $u$ and $v$ in $T_2$ , choose $f = (u, p_1)$ (where $p_1$ may or may not be $v$ ).

Cutting $T_2$ on $f$ to get $T_2'$ , we have two connected components $K_1$ , $K_2$ , both of which are trees. Now, we can add $e$ to $T_2'$ , and by (1), we get a tree, since $K_1$ and $K_2$ are both trees, with $u \in K_1$ and $v \in K_2$ .

Similarly, adding $f = (x, y)$ to $T_1'$ , we also get a tree by (1), since $C_1$ is a tree with $x \in C_1$ , and $C_2$ is a tree with $y \in C_2$ .

Therefore, we have both $T_1 - e + f$ and $T_2 - f + e$ are trees.

Problem 5

Call my algorithm $S(A, l, u)$ .

Correctness

Claim: Given a list of $n$ numbers $A = a_1, a_n, \ldots a_n$ , the number of interval sums in $\lbrack l, u \rbrack$ is correctly calculated by $S(A, l, u) = R + L + C$ , where they are defined as...

$L = S([a_1, \ldots, a_{\lceil n/2 \rceil - 1}], l, u)$

i.e. the number of interval sums in the first half of $A$ $\lbrack l, u \rbrack$ .

$R = S([a_{\lceil n/2 \rceil}, \ldots, a_n], l, u)$

i.e. the number of interval sums in the latter half of $A$ in $\lbrack l, u \rbrack$ .

$C = | { (i, j) : i \le \lceil n/2 \rceil - 1 \land j \ge \lceil n/2 \rceil \land I(i, j) \in [l, u] } |$

i.e the number of interval sums that cross over the midpoint of $A$ and are in $\lbrack l, u \rbrack$

Proof: By induction

Let $P(n)$ be the above claim, i.e. that given $n$ numbers $A$ and a range $\lbrack l, u \rbrack$ , my algorithm returns the number of interval sums on $A$ that are between $\lbrack l, u \rbrack$ via the following calculation:

$S(A, l, u) = L + R + C$

Base Case:

$P(1)$

Given a single number, there is only one interval sum to consider, i.e. $I(1, 1)$ . In this case, the left half of $A$ is empty, and the right half consists of a single element. There are no interval sums that cross over the midpoint, so if $l \le I(1, 1) \le u$ , then $L = 1$ , and otherwise, $L = 0$ . This gives us a final answer of $S(A, l, u) = L + R + C = 0 + R + 0 = R$ . There is only one possible interval to consider, and $R$ is set depending on whether it is in $\lbrack l, u \rbrack$ , so this is correct.

$P(2)$

Given two numbers $A = a_1, a_2$ , we have $3$ intervals to check corresponding to $I(1, 1)$ , $I(2, 2)$ , and $I(1, 2)$ . We have that $L = 1$ if $I(1, 1) \in $l, u$ $, and$ 0 $otherwise,$ R = 1 $if$ I(2, 2) \in $l, u$ $, and$ 0 $otherwise, and$ C = 1 $if$ I(1, 2) \in $l, u$ $, and$ 0 $otherwise. Therefore, we check all possible interval sums, and output the sum of$ L, R, C $, which corresponds to the number of these interval sums in$ \lbrack l, u \rbrack$.

I.H.: Suppose $P(1) \land \ldots \land P(\lceil n/2 \rceil) \land \ldots \land P(n - 1)$

I.S.:

Let $A = a_1, a_2, \ldots, a_n$ , and $l, u$ be arbitrary numbers such that $l < u$ . Let $m = \lceil n/2 \rceil$ , and define $A_1 = a_1, \ldots, a_{m - 1}$ , and $A_2 = a_m, \ldots, a_n$ .

Since $|A_1| \le n - 1$ , and $|A_2| \le n - 1$ , we have that $P(|A_1|)$ and $P(|A_2|)$ holds. Thus, $L = S(A_1, l, u)$ and $R = S(A_2, l, u)$ , which are the total number of interval sums in $A_1$ and $A_2$ respectively.

Considering the set $U \subset $n$ ^2 $of every possible interval over$ A $, we can partition$ U$ into two sets, ones that don't cross over the midpoint, $M^c$ , and those that do cross over the midpoint, $M$ . Note that $U = M \cup M^c$ , and $M \cap M^c = \emptyset$ .

By our inductive hypothesis, we have...

$L + R = |{ (i, j) : (i, j) \in M^c \land I(i, j) \in [l, u]}|$

since we considered precisely the intervals in each side of $A$ , which are exactly the intervals that don't cross the midpoint, and counted the number of such intervals that have $I(i, j) \in $l, u$ $.

To find $|{ (i, j) : (i, j) \in M \land I(i, j) \in $l, u$ }| = C $, we must check each suitable pair$ (i, j) $that has$ i < m \le j $. We can first consider each$ I(m, j) $for$ j \ge m $, and then find all such$ I(i, m - 1) $for$ i \le m - 1 $such that$ I(m, j) + I(i, m - 1) \in $l, u$ $. Note that we don't need to consider all elements of$ M $, since we can skip those that don't have$ I(i, j) \in $l, u$ $.

My algorithm does this by first calculating each $I(i, m - 1)$ and sorting them into a list $H$ . Then, it iterates through all $I(m, j)$ for $m \le j \le n$ , and searches for the lowest and highest index $i_{min}, i_{max}$ in $H$ such that $H_i + I(m, j) \in $l, u$ $. We then calculate$ C $by maintaining a running sum of$ i_{max} - i_{min} + 1 $, ie the number of suitable$ i $'s for a given$ j $. This is guaranteed to consider each relevant$ i < m \le j $, since$ H $is sorted, and so all indices between and including$ \lbrack i_{min}, i_{max} \rbrack $would also be valid prefixes to the right half of the intervals we are considering, and those outside of$ \lbrack i_{min}, i_{max} \rbrack $would correspond to an interval sum that is outside$ \lbrack l, u \rbrack$.

Therefore, we have...

$C = |{ (i, j) : (i, j) \in M \land I(i, j) \in [l, u]}|$

And so $S(A, l, u) = L + R + C$ correctly considers all relevant interval sums, counting the number of interval sums between $\lbrack l, u \rbrack$ .

def count_interval_sums_bruteforce(A, I, l, u, low, high):
  ans = 0
  for i in range(low, high):
    for j in range(i, high):
      s = I(i, j)
      if s <= u and s >= l:
        ans += 1
  return ans

def count_interval_sums_recursive(A, I, l, u, low, high):
  if high - low <= 2:
    return count_interval_sums_bruteforce(A, I, l, u, low, high)

  mid = low + (high - low)//2
  left = count_interval_sums_recursive(A, I, l, u, low, mid)
  right = count_interval_sums_recursive(A, I, l, u, mid, high)

  total_sum = sum(A[low:high])
  left_sum = sum(A[low:mid])
  right_sum = total_sum - left_sum

  prefix = prefix_sum(A[low:high])
  suffix = suffix_sum(A[low:high])

  possible_left_halfs = sorted(
      [total - prefix[i] - right_sum for i, s in enumerate(suffix[:mid])]
  )

  cross = 0

  for i in range(len(prefix)//2, len(prefix)):
    right_half = right_sum - suffix[i]

    min_left_half = l - right_half
    max_left_half = u - right_half

    j = possible_left_halfs.binary_search(min_left_half)
    k = possible_left_halfs.binary_search(max_left_half)

    cross += k - j + 1

  return left + right + cross


def count_interval_sums(A, l, u):
  n = len(A)
  prefix, suffix = [0] * (n+1), [0] * (n+1)
  pre = 0
  total = sum(A)
  for i in range(n):
    prefix[i] = pre
    pre += A[i]
    suffix[i] = total - pre

  def I(i, j):
    return total - prefix[i] - suffix[j]

  return count_interval_sums_recursive(A, I, l, u, 0, len(A))