4

I'm trying to understand this dynamic programming related problem, adapted from Kleinberg's Algorithm Design book. Not homework: i've already a solution, just considering if i'm ok with the theory.

Suppose we want to replicate a file over a collection of $n$ servers $S_1, S_2 , \ldots , S_n$. Each server $S_i$ has a placement cost $c_i$ to place a copy of the file at $S_i$. If a user requests the file from $S_i$ and no copy is present, then servers $S_{i+1}, S_{i+2}, S_{i+3}, \ldots$ are searched in order until a copy is found, say at $S_j$ (where $j > i$). This results in an access cost $a_i$, where $$ a_i= \begin{cases} j-i \qquad \hbox{if $S_i$ does not contain a copy the file} \\ 0 \qquad \hbox{ if $S_i$ contains a copy of the file} \end{cases} $$ We require that $S_n$ contains the file, so that all searches will terminate. Given $I \subset \left\{1,\ldots, n − 1\right\}$, a subset of servers to place the file on, we define the total cost

$$ c(I) = c_n + \sum_{i \in I} c_i + \sum_{i=1}^{n} a_i $$

To be the sum of the placement costs for each server that contains the file (recall that $S_n$ must contain the file), plus the sum of the access costs associated with all $n$ servers. Design a polynomial-time algorithm to compute a subset I of servers to contain the file that minimizes total cost.

My approach is to consider every possible choice, like for other problems e.g. segmented least squares, ending up with a $O(n^4)$ running time algorithm.

Anyway it looks like there is a better solution, given by the following recurrence:

$$ OPT(j) = c_j + \displaystyle\min_{0 \leq i < j}(OPT(i)+\binom{j-i}{2}) $$

With the initialitazions $OPT(0)= 0$ and $\binom{1}{2}=0$. The answer is given as usual by $OPT(n)$. This is to me very counterintuitive: why the binomial coefficient, and why the number $2$?.

Is it a standard way to approach similar problems?

kentilla
  • 302
  • 1
  • 13

1 Answers1

4

The binomial coefficient here appears through the formula $$ \sum_{i=1}^{n-1} i = \binom{n}{2}. $$ In the formula, $OPT(j)$ is the minimal cost of a solution for the first $j$ servers. Another way of looking it that it's the minimal cost up to $j$ of a solution which puts the file in $S_j$.

Where does the recurrence come form? Take an optimal solution for the first $i$ servers. We know that the file appears in $S_j$. Let $S_i$ be the next largest server it appears on (or $i=0$ if it only appears in $S_j$). Without loss of generality, we can assume that the solution restricted to $S_1,\ldots,S_i$ is optimal, and so of cost $OPT(i)$. Putting the file in $S_j$ costs $c_j$. Accessing the file at positions $i+1,\ldots,j-1$ costs $(j-(i+1)) + \cdots + (j-(j-1)) = \binom{j-i}{2}$. This gives the recurrence formula. (We also have to verify the case $i=0$ separately, which I leave to you.)

Yuval Filmus
  • 276,994
  • 27
  • 311
  • 503