Interval Sum Counting Algorithm

Category: Algorithms ..

Problem Set 4

Problem 1

$I = (0,1),(0,3),(4,5),(2,5)(0, 1), (0, 3), (4, 5), (2, 5)$

These intervals are already sorted by finish time.

  1. Initially, there are no classrooms allocated, so I0I_0 is placed in C0C_0.
  2. Next, I1I_1 is incompatible with I0I_0, so it is placed in a newly allocated C1C_1.
  3. I2I_2 is compatible with C0C_0 and C1C_1, so it is placed in C0C_0.
  4. I3I_3 isn't compatible with C0C_0 or C1C_1 (because of I2I_2 and I1I_1 respectively), so it is placed in a newly allocated C2C_2

So we have...

  • C0=I0,I2C_0 = { I_0, I_2 }
  • C1=I1C_1 = { I_1 }
  • C2=I3C_2 = { I_3 }

As you can see below, the maximum depth is 2, so this algorithm allocated more than max-depth classrooms.


I_1  +-----------------+     +-----+  I_2
I_0  +-----+     +-----------------+  I_3

     +-----+-----+-----+-----+-----+
     0     1     2     3     4     5

Problem 2

Algorithm

  • Let SS be the (initially empty) set of pairs
  • Sort A=a1,a2,,anA = a_1, a_2, \ldots, a_n in increasing order to get B=b1,b2,,bnB = b_1, b_2, \ldots, b_n.
  • While there are elements remaining, pick the smallest and largest element (the first and last bi,bjb_i, b_j), and remove them from BB.
  • Add bi,bj{b_i, b_j} to SS
  • Once BB is empty, return SS.
# O(nlog(n))
def min_max_pairs(A):
  A.sort()

  l, r = 0, len(A) - 1
  res = -infinity
  while l < r:
    res = max(res, A[l] + A[r])
    l += 1
    r -= 1
  return res

Running Time

Sorting takes O(nlog(n))O(n\log(n)). Then, we perform n2\frac{n}{2} iterations of our while loop, each requiring constant work. This gives us an overall runtime of O(nlog(n))O(n\log(n))

Correctness: Greedy Stays Ahead

For notational convenience, let a1,a2,,ana_1, a_2, \ldots, a_n be sorted. Let GG be my algorithm, and XX be the optimum, and for the sake of contradiction, suppose XX chooses pairs differently from GG, e.g. not of the form (ai,ani)(a_i, a_{n - i}) as GG does.

Let g(ai)g(a_i) be a function that outputs the "choice" of partner for any aia_i by GG, so g(ai)=anig(a_i) = a_{n - i}, and let x(ai)x(a_i) mean the same for XX.

Define P(n)P(n) as...

Given nn sorted numbers (with nn being even) a1,ana_1, \ldots a_n, GmaxXmaxG_{max} \le X_{max}, where GmaxG_{max} and XmaxX_{max} are defined as follows:

Gmax=max(g(ai)+ai:1in) G_{max} = max({ g(a_i) + a_i : 1 \le i \le n })

Xmax=max(x(ai)+ai:1in) X_{max} = max({ x(a_i) + a_i : 1 \le i \le n })

Base Case:

P(2)P(2): we have a1,a2a_1, a_2, with a1<a2a_1 < a_2. There is only one possible pair, so GmaxXmaxG_{max} \le X_{max}, and P(2)P(2) holds.

IH: Suppose P(k2)P(k - 2) holds.

IS:

Let A=a1,a2,,ak1,akA = a_1, a_2, \ldots, a_{k - 1}, a_k be sorted numbers. Remove a1a_1 and aka_k to get A=a2,,ak1A' = a_2, \ldots, a_{k - 1}. Since AA' is still sorted, and has A=k2|A'| = k - 2, we know P(k2)P(k - 2) holds.

Therefore, we have $G'{max} \le X'for for A'$.

Consider an arbitrary pair chosen by GG on AA', (ai,aj)(a_i, a_j), with aiaja_i \le a_j. By our sort order, we have a1aiajaka_1 \le a_i \le a_j \le a_k. Now, consider the ways we could swap elements between these two pairs

Case 1: a1+ak>ai+aja_1 + a_k > a_i + a_j

  • (a1,aj),(ai,ak)(a_1, a_j), (a_i, a_k)
    • ai+aka1+aka_i + a_k \ge a_1 + a_k, since aia1a_i \ge a_1
  • (a1,ai),(ak,aj)(a_1, a_i), (a_k, a_j)
    • ak+aja1+aka_k + a_j \ge a_1 + a_k, since ajaia_j \ge a_i

Case 2: a1+akai+aja_1 + a_k \le a_i + a_j

  • (a1,aj),(ai,ak)(a_1, a_j), (a_i, a_k)
    • ai+akai+aja_i + a_k \ge a_i + a_j, since akaja_k \ge a_j
  • (a1,ai),(ak,aj)(a_1, a_i), (a_k, a_j)
    • ak+ajai+aja_k + a_j \ge a_i + a_j, since akaia_k \ge a_i

In either of the above cases, we are increasing the max of the two sums by swapping elements, and this holds for an arbitrary ai,aja_i, a_j chosen by GG', which is at least as good as the optimum. Since we know $X'{max} \ge G',andso, and so P(k)$ holds.}$, and also that any pairing other than the one picked by GG (a1,aka_1, a_k) leads to an increased sum, it must be the case that $G_{max} \le X_{max

Problem 3

Lemma (1): For any two trees T1T_1 and T2T_2 with vT1v \in T_1 and uT2u \in T_2, if we add the edge (u,v)(u, v) between the two trees, we get a new tree T3T_3.

Proof:

Let nn be the number of vertices in T1T_1, and mm be the number of vertices in T2T_2. Since they are both trees, we have n1n - 1 edges in T1T_1, and m1m - 1 edges in T2T_2. Connecting T1T_1 and T2T_2 on (u,v)(u, v) then results in a graph with m+nm + n vertices, and m1+n1=m+n1m - 1 + n - 1 = m + n - 1 edges.

Furthermore, both T1T_1 and T2T_2 are connected. Therefore, after adding an the edge (u,v)(u, v) between them, this results in a graph that is also connected, since all vertices in T1T_1 can reach all vertices in T2T_2 through this edge, and vice versa.

Finally, since the resulting graph is connected and has m+n1m + n - 1 edges, it must be a tree.

Lemma (2): For any tree TT removing an edge e=(u,v)e = (u, v) results in two trees.

Proof:

Let TT be an arbitrary tree and e=(u,v)e = (u, v) be an arbitrary edge in TT.

Removing the edge ee from TT results in two separate connected components, each containing a subset of the vertices from TT.

The resulting components have no cycles, since TT was a tree, and is therefore acyclic, and removing an edge cannot create a cycle.

Each component is connected, as every pair of vertices in TT was connected by a unique path (since TT is a tree), and removing ee does not disconnect any other pairs of vertices within each component.

Therefore, the two components formed by cutting TT on ee are both trees.

Lemma (3): For any partition of vertices of a tree TT into two sets AA and BB, there must be an edge (u,vu, v) between some vertex uAu \in A and vBv \in B.

Proof:

Let TT be a tree and A,Bbeapartitionofverticesin{ A, B } be a partition of vertices in T.Supposeforthesakeofcontradictionthattherewasnoedgebetweenverticesin. Suppose for the sake of contradiction that there was no edge between vertices in Aand and B$.

Let xAx \in A and yBy \in B be arbitrary vertices. Since TT is a tree, it is connected. Therefore, there must exist a path xyx \to y, so at some point along this path a vertex in AA must be connected to a vertex in BB, which is a contradiction.

Main Proof:

Let T1T_1 and T2T_2 be edge disjoint spanning trees over GG. Consider an arbitrary edge e=(u,v)T1e = (u, v) \in T_1.

Let T1=T1eT_1' = T_1 - e. Since T1T_1 was a tree, this splits T1T_1' into two connected components C1,C2C_1, C_2, both of which are also trees by (2). We have uC1u \in C_1 and vC2v \in C_2.

Since T2T_2 is also a tree, and is therefore connected, by (3), there must exist an edge f=(x,y)T2f = (x, y) \in T_2, such that xC1x \in C_1 and yC2y \in C_2. For any two vertices in a tree, there is exactly one path between them, since otherwise there would either be a non-connected vertex (no path), or a cycle (more than one path). We know (u,v)T2(u, v) \notin T_2, since T1T_1 and T2T_2 are edge-disjoint, so letting u,p1,p2,,vu, p_1, p_2, \ldots, v be the path between uu and vv in T2T_2, choose f=(u,p1)f = (u, p_1) (where p1p_1 may or may not be vv).

Cutting T2T_2 on ff to get T2T_2', we have two connected components K1K_1, K2K_2, both of which are trees. Now, we can add ee to T2T_2', and by (1), we get a tree, since K1K_1 and K2K_2 are both trees, with uK1u \in K_1 and vK2v \in K_2.

Similarly, adding f=(x,y)f = (x, y) to T1T_1', we also get a tree by (1), since C1C_1 is a tree with xC1x \in C_1, and C2C_2 is a tree with yC2y \in C_2.

Therefore, we have both T1e+fT_1 - e + f and T2f+eT_2 - f + e are trees.

Problem 5

Call my algorithm S(A,l,u)S(A, l, u).

Correctness

Claim: Given a list of nn numbers A=a1,an,anA = a_1, a_n, \ldots a_n, the number of interval sums in [l,u]\lbrack l, u \rbrack is correctly calculated by S(A,l,u)=R+L+CS(A, l, u) = R + L + C, where they are defined as...

L=S([a1,,an/21],l,u)L = S([a_1, \ldots, a_{\lceil n/2 \rceil - 1}], l, u)

i.e. the number of interval sums in the first half of AA [l,u]\lbrack l, u \rbrack.

R=S([an/2,,an],l,u)R = S([a_{\lceil n/2 \rceil}, \ldots, a_n], l, u)

i.e. the number of interval sums in the latter half of AA in [l,u]\lbrack l, u \rbrack.

C=(i,j):in/21jn/2I(i,j)[l,u]C = | { (i, j) : i \le \lceil n/2 \rceil - 1 \land j \ge \lceil n/2 \rceil \land I(i, j) \in [l, u] } |

i.e the number of interval sums that cross over the midpoint of AA and are in [l,u]\lbrack l, u \rbrack

Proof: By induction

Let P(n)P(n) be the above claim, i.e. that given nn numbers AA and a range [l,u]\lbrack l, u \rbrack, my algorithm returns the number of interval sums on AA that are between [l,u]\lbrack l, u \rbrack via the following calculation:

S(A,l,u)=L+R+CS(A, l, u) = L + R + C

Base Case:

P(1)P(1)

Given a single number, there is only one interval sum to consider, i.e. I(1,1)I(1, 1). In this case, the left half of AA is empty, and the right half consists of a single element. There are no interval sums that cross over the midpoint, so if lI(1,1)ul \le I(1, 1) \le u, then L=1L = 1, and otherwise, L=0L = 0. This gives us a final answer of S(A,l,u)=L+R+C=0+R+0=RS(A, l, u) = L + R + C = 0 + R + 0 = R. There is only one possible interval to consider, and RR is set depending on whether it is in [l,u]\lbrack l, u \rbrack, so this is correct.

P(2)P(2)

Given two numbers A=a1,a2A = a_1, a_2, we have 33 intervals to check corresponding to I(1,1)I(1, 1), I(2,2)I(2, 2), and I(1,2)I(1, 2). We have that L=1L = 1 if $I(1, 1) \in l,ul, u,and, and 0otherwise, otherwise, R = 1if if I(2, 2) \in l,ul, u,and, and 0otherwise,and otherwise, and C = 1if if I(1, 2) \in l,ul, u,and, and 0otherwise.Therefore,wecheckallpossibleintervalsums,andoutputthesumof otherwise. Therefore, we check all possible interval sums, and output the sum of L, R, C,whichcorrespondstothenumberoftheseintervalsumsin, which corresponds to the number of these interval sums in \lbrack l, u \rbrack$.

I.H.: Suppose P(1)P(n/2)P(n1)P(1) \land \ldots \land P(\lceil n/2 \rceil) \land \ldots \land P(n - 1)

I.S.:

Let A=a1,a2,,anA = a_1, a_2, \ldots, a_n, and l,ul, u be arbitrary numbers such that l<ul < u. Let m=n/2m = \lceil n/2 \rceil, and define A1=a1,,am1A_1 = a_1, \ldots, a_{m - 1}, and A2=am,,anA_2 = a_m, \ldots, a_n.

Since A1n1|A_1| \le n - 1, and A2n1|A_2| \le n - 1, we have that P(A1)P(|A_1|) and P(A2)P(|A_2|) holds. Thus, L=S(A1,l,u)L = S(A_1, l, u) and R=S(A2,l,u)R = S(A_2, l, u), which are the total number of interval sums in A1A_1 and A2A_2 respectively.

Considering the set $U \subset nn^2ofeverypossibleintervalover of every possible interval over A,wecanpartition, we can partition U$ into two sets, ones that don't cross over the midpoint, McM^c, and those that do cross over the midpoint, MM. Note that U=MMcU = M \cup M^c, and MMc=M \cap M^c = \emptyset.

By our inductive hypothesis, we have...

L+R=(i,j):(i,j)McI(i,j)[l,u]L + R = |{ (i, j) : (i, j) \in M^c \land I(i, j) \in [l, u]}|

since we considered precisely the intervals in each side of AA, which are exactly the intervals that don't cross the midpoint, and counted the number of such intervals that have $I(i, j) \in l,ul, u$.

To find $|{ (i, j) : (i, j) \in M \land I(i, j) \in l,ul, u}| = C,wemustcheckeachsuitablepair, we must check each suitable pair (i, j)thathas that has i < m \le j.Wecanfirstconsidereach. We can first consider each I(m, j)for for j \ge m,andthenfindallsuch, and then find all such I(i, m - 1)for for i \le m - 1suchthat such that I(m, j) + I(i, m - 1) \in l,ul, u.Notethatwedontneedtoconsiderallelementsof. Note that we don't need to consider all elements of M,sincewecanskipthosethatdonthave, since we can skip those that don't have I(i, j) \in l,ul, u$.

My algorithm does this by first calculating each I(i,m1)I(i, m - 1) and sorting them into a list HH. Then, it iterates through all I(m,j)I(m, j) for mjnm \le j \le n, and searches for the lowest and highest index imin,imaxi_{min}, i_{max} in HH such that $H_i + I(m, j) \in l,ul, u.Wethencalculate. We then calculate Cbymaintainingarunningsumof by maintaining a running sum of i_{max} - i_{min} + 1,iethenumberofsuitable, ie the number of suitable isforagiven's for a given j.Thisisguaranteedtoconsidereachrelevant. This is guaranteed to consider each relevant i < m \le j,since, since Hissorted,andsoallindicesbetweenandincluding is sorted, and so all indices between and including \lbrack i_{min}, i_{max} \rbrackwouldalsobevalidprefixestotherighthalfoftheintervalsweareconsidering,andthoseoutsideof would also be valid prefixes to the right half of the intervals we are considering, and those outside of \lbrack i_{min}, i_{max} \rbrackwouldcorrespondtoanintervalsumthatisoutside would correspond to an interval sum that is outside \lbrack l, u \rbrack$.

Therefore, we have...

C=(i,j):(i,j)MI(i,j)[l,u]C = |{ (i, j) : (i, j) \in M \land I(i, j) \in [l, u]}|

And so S(A,l,u)=L+R+CS(A, l, u) = L + R + C correctly considers all relevant interval sums, counting the number of interval sums between [l,u]\lbrack l, u \rbrack.

def count_interval_sums_bruteforce(A, I, l, u, low, high):
  ans = 0
  for i in range(low, high):
    for j in range(i, high):
      s = I(i, j)
      if s <= u and s >= l:
        ans += 1
  return ans

def count_interval_sums_recursive(A, I, l, u, low, high):
  if high - low <= 2:
    return count_interval_sums_bruteforce(A, I, l, u, low, high)

  mid = low + (high - low)//2
  left = count_interval_sums_recursive(A, I, l, u, low, mid)
  right = count_interval_sums_recursive(A, I, l, u, mid, high)

  total_sum = sum(A[low:high])
  left_sum = sum(A[low:mid])
  right_sum = total_sum - left_sum

  prefix = prefix_sum(A[low:high])
  suffix = suffix_sum(A[low:high])

  possible_left_halfs = sorted(
      [total - prefix[i] - right_sum for i, s in enumerate(suffix[:mid])]
  )

  cross = 0

  for i in range(len(prefix)//2, len(prefix)):
    right_half = right_sum - suffix[i]

    min_left_half = l - right_half
    max_left_half = u - right_half

    j = possible_left_halfs.binary_search(min_left_half)
    k = possible_left_halfs.binary_search(max_left_half)

    cross += k - j + 1

  return left + right + cross


def count_interval_sums(A, l, u):
  n = len(A)
  prefix, suffix = [0] * (n+1), [0] * (n+1)
  pre = 0
  total = sum(A)
  for i in range(n):
    prefix[i] = pre
    pre += A[i]
    suffix[i] = total - pre

  def I(i, j):
    return total - prefix[i] - suffix[j]

  return count_interval_sums_recursive(A, I, l, u, 0, len(A))