JOIN
Get Time
statistics_w  Match Editorial
2007 TopCoder Open
Algorithm Round 1A

Saturday, April 7, 2007

Match summary

As everyone involved knew for a few days, this round was supposed to be the toughest of the Round 1 groups. With roughly the same number of reds as the other two groups combined, and with a rating of 1617 necessary to be expected to qualify, this surely was a tough crowd. One mistake could mean elimination, one challenge case could make the difference between advancing and dropping out of the tournament.

With 79 contestants solving the hard problem the expectations were fulfilled, if not exceeded. The round was tough and everyone tried their best. ACRush was leading the pack into the Challenge phase, but in the end he only placed second – all thanks to Petr's five successful challenges. The third place went to Per.

While the top of the field was full of the usual suspects, the rest of the division summary wasn't all red and yellow. Many lower-rated coders were able to gain more than enough points to beat the odds and advance to the next round. Congratulation to all the advancers, and see you all in Round 2!

The Problems

Turnpike rate it discuss it
Used as: Division One - Level One:
Value 250
Submission Rate 524 / 634 (82.65%)
Success Rate 329 / 524 (62.79%)
High Score yava for 247.66 points (2 mins 46 secs)
Average Score 182.55 (for 329 correct submissions)
This problem offered a wide variety of correct approaches, so everyone could choose the approach he or she is most comfortable with. We'll mention a few of these approaches.

Binary search for the answer

This is an optimization problem – the question we have to answer is of the form "find the smallest X such that a condition holds."

Moreover, the problem has got the property that if the condition holds for some X, it will also hold for all values greater than X.

Each time you find yourself in a similar situation, there is one natural question you should ask: "Given a fixed value X, can I find an efficient way to verify whether the condition holds for my X?"

Whenever the answer is positive, there is an efficient solution of the original task: Use binary search on the set of possible answers to find the smallest answer such that the condition holds.

In our case, we know that the answer is greater than 0, and it is at most equal to pikeLength. All that remains is finding an efficient way of checking whether there is a solution that has all gaps smaller or equal to a given X.

This is really simple. Imagine that we drive along the turnpike. Each time the current gap reaches X, we place a new plaza. In this way we will ensure that no gap exceeds X, and we will use the minimum number of new plazas. When we get to the end of the turnpike, we simply check whether the number of new plazas exceeds the given limit n or not.

C++ code of this solution follows:
class Turnpike {
public:
  bool check(int pikeLength, int n, const vector<int> &plazas, int X) {
    int used = 0;
    for (int i=1; i<int(plazas.size()); i++) {
      int thisGap = plazas[i]-plazas[i-1];
      while (thisGap > X) {
        used++;
        thisGap -= X;
      }
    }
    return (used <= n);
  }

  int unserviced(int pikeLength, int n, vector <int> plazas) {
    plazas.push_back(0); plazas.push_back(pikeLength);
    sort(plazas.begin(),plazas.end());
    int lo=0, hi=pikeLength;
    while (hi-lo > 1) {
      int med = (lo+hi)/2;
      if (check(pikeLength,n,plazas,med)) hi=med; else lo=med;
    }
    return hi;
  }
};

Linear search for the answer

In this case the set of possible answers was small. If someone was not comfortable with binary search, there was an easy way out: just try all the answers!

Binary search is nasty, and many less experienced coders fell into the trap of mishandling some boundary cases, and so this approach is smarter than it may seem at a first glance. Actually, the code below never took more than 1ms when I tested it in the practice room, so there was plenty of room for an even slower solution.

C++ code follows:
class Turnpike {
public:
  bool check(int pikeLength, int n, const vector<int> &plazas, int X) {
    int used = 0;
    for (int i=1; i<int(plazas.size()); i++) {
      int thisGap = plazas[i]-plazas[i-1];
      while (thisGap > X) {
        used++;
        thisGap -= X;
      }
    }
    return (used <= n);
  }

  int unserviced(int pikeLength, int n, vector <int> plazas) {
    plazas.push_back(0); plazas.push_back(pikeLength);
    sort(plazas.begin(),plazas.end());
    for (int i=1; i<=pikeLength; i++)
      if (check(pikeLength,n,plazas,i))
        return i;
  }
};

Playing it safe with dynamic programming

Using dynamic programming, we can try all possible plaza placements, and be sure to pick the optimal one.

All the subproblems we will be solving look as follows: given a segment of the turnpike and a maximum number of new plazas, what's the smallest possible maximal gap we can obtain?

When solving such a subproblem, consider the last plaza before the end of the current segment. This is either the last of the original plazas that lie inside the segment, or we placed a new plaza somewhere between that original plaza and the end of the segment. In each case, we know the length of the last gap, and for the rest of the segment we get a new, smaller subproblem of the same type.

C++ code using memoization:
int isPlaza[1024];
int memo[1024][108];

int solve(int L, int N) {
  // find the best solution for the segment 0..L
  // with at most N new plazas

  int &res = memo[L][N];
  if (res >= 0) return res;
  res = 987654321;
  if (L==0) return res=0;
  for (int prev=L-1; prev>=0; prev--) {
    if (isPlaza[prev]) {
      res = min(res,max(L-prev,solve(prev,N)));
    } else {
      if (N>0) {
        res = min(res,max(L-prev,solve(prev,N-1)));
      }
    }
    if (isPlaza[prev]) break;
  }
  return res;
}

class Turnpike {
public:
  int unserviced(int pikeLength, int n, vector <int> plazas) {
    plazas.push_back(0);
    plazas.push_back(pikeLength);
    sort(plazas.begin(),plazas.end());
    for (int i=0; i<int(plazas.size()); i++) isPlaza[plazas[i]]=1;
    memset(memo,-1,sizeof(memo));
    return solve(pikeLength,n);
  }
};

A greedy solution

One greedy solution can be based on the following observation: Once we know how many plazas are placed into each of the original gaps, we can easily find an optimal placement for them: in each gap they have to be spaced as evenly as possible. The resulting new gap will be equal to ceiling(original_gap / new_plazas + 1).

We will now sequentially assign each of the new plazas to one of the original gaps. The process will be simple: in each step, determine the original gap where the current gap is largest. This is the value we need to decrease, and so we assign one more new plaza to this original gap.

With greedy solutions, a proof should always be a part of the solution. Often it is really easy to find an incorrect greedy solution, and often you only discover this after the system tests.

Glossary of terms in the proof:
Original gap: a gap between two consequent plazas given in the input (including 0 and pikeLength).
Current gap: the length of the (longest) gap after we placed some new plazas into an original gap.
Solution value: the length of the longest current gap.

So let's now prove that our solution finds an optimal assignment of the plazas. We will do this by induction. For one new plaza this is clearly true. Now for the induction step assume that our solution finds an optimal solution A for K new plazas. How will the situation change if we have K+1 plazas to distribute?

Consider some optimal solution B for K+1 new plazas. Let G be the set of gaps that have less new plazas assigned than in solution A. If G is empty, we won. If not, for each gap in G we can make the following observation:

Consider the moment just before this gap got its last plaza in solution A. At this moment, this original gap was having the longest current gap. Let the length of that current gap be L. The value of solution A is at most L, because before adding some plazas it was L. The value of solution B is at least L, as for this original gap the current gap in B is at least L.

This is only possible if the value of both solutions is exactly L.

Summary: Either G was empty, and our algorithm works, or G was non-empty. In the second case we deduced that the optimal solution for K+1 plazas has the same value as the optimal solution for K plazas. And clearly our algorithm will produce a solution with this value – we take an optimal solution with K plazas and add one new plaza somewhere. Thus our algorithm always finds an optimal solution.

TableLabel rate it discuss it
Used as: Division One - Level Two:
Value 500
Submission Rate 385 / 634 (60.73%)
Success Rate 168 / 385 (43.64%)
High Score Per for 453.37 points (9 mins 18 secs)
Average Score 293.69 (for 168 correct submissions)
The key observation for this problem: suppose someone told us how to split the element at [0][0] into its row and column labels. At this moment either all row labels and column labels are uniquely determined, or we discover that our "someone" lied and that this split doesn't lead to a solution.

Why is that so? Consider only the first row of the table. We already know the row label for this row. This means that we can check whether all elements of this row start with our row label, and deduce all column labels.

The same can be done with the first column to deduce all row labels.

Once we deduced all row and column labels, we may traverse the entire table and check whether all the elements match. If yes, we found a solution.

The last step is simple: There are only a few ways how to split the element at [0][0]. Try them all and see how many of the ways lead to a valid solution.

C++ code follows:
class TableLabel {
public:
  bool isPrefix(const string &A, const string &B) {
    if (A.size() > B.size()) return false;
    string C( B.begin(), B.begin()+A.size() );
    return A==C;
  }

  bool isSuffix(string A, string B) {
    reverse(A.begin(),A.end());
    reverse(B.begin(),B.end());
    return isPrefix(A,B);
  }

  vector <string> labels(vector <string> input) {

    // process the input
    vector< vector<string> > table;
    int R = input.size();
    for (int r=0; r<R; r++) {
      stringstream ss(input[r]);
      vector<string> row;
      string token;
      while (ss >> token) row.push_back( token );
      table.push_back( row );
    }
    int C = table[0].size();

    // table[][] now contains the table

    vector <string> res;

    for (int split=1; split <= int(table[0][0].size())-1; split++) {
      // compute the values for [0][0]
      string row0 ( table[0][0].begin(), table[0][0].begin()+split );
      string col0 ( table[0][0].begin()+split, table[0][0].end() );

      // check and process row 0 and col 0
      vector<string> colLabels, rowLabels;
      bool allOK = true;
      for (int r=0; r<R; r++) {
        if (table[r][0].size() < col0.size()+1) { allOK = false; break; }
        if (!isSuffix(col0,table[r][0])) { allOK = false; break; }
        rowLabels.push_back( string( table[r][0].begin(), table[r][0].end() - col0.size() ) );
      }
      if (!allOK) continue;
      for (int c=0; c<C; c++) {
        if (table[0][c].size() < row0.size()+1) { allOK = false; break; }
        if (!isPrefix(row0,table[0][c])) { allOK = false; break; }
        colLabels.push_back( string( table[0][c].begin() + row0.size(), table[0][c].end() ) );
      }
      if (!allOK) continue;

      // check the entire table
      for (int r=0; r<R; r++)
        for (int c=0; c<C; c++)
          if (table[r][c] != rowLabels[r]+colLabels[c]) allOK = false;
      if (!allOK) continue;

      // record the solution
      if (res.empty()) {
        res.insert(res.end(),colLabels.begin(),colLabels.end());
        res.insert(res.end(),rowLabels.begin(),rowLabels.end());
      } else {
        res.clear(); res.push_back("multiple"); return res;
      }
    }
    if (res.empty()) res.push_back("none");
    return res;
  }
};
Destruction rate it discuss it
Used as: Division One - Level Three:
Value 1000
Submission Rate 95 / 634 (14.98%)
Success Rate 79 / 95 (83.16%)
High Score ACRush for 931.56 points (7 mins 48 secs)
Average Score 648.24 (for 79 correct submissions)
The first important observation about the problem is that it scales. If the problem statement contained the number 200.0 instead of 100.0, the solution would be almost exactly the same. The only change would be that the places of all queries would be multiplied by two, and of course also the answer would be twice as large.

From this point on, we will consider a normalized version of the problem: the upper bound on pressure will be equal to 1.

Suppose that we want to find an optimal solution for T tests and at most D destructions. Consider the first query we make. Let the pressure for this query be X. There are two things that can happen: In the first case, the container breaks, in the second case it doesn't.

In the first case, we know that the exact pressure is in the interval [0,X], and we have T-1 tests and at most D-1 destructions to find it. In the second case, the pressure is in the interval [X,1], and we have T-1 tests and at most D destructions left.

In both cases we have (almost) the same problem with lower constraints. We can solve each of these problems recursively, and memoize the solutions.

There are two reasons for the "(almost)" in the previous paragraph. One of them is the need to scale the results appropriately. The other reason is the need to handle the upper bound in a special way. Note that there is a difference between the two subproblems presented above. In the first case we already know that at pressure X the container breaks. In the second case, we may not know this.

A conceptually clear way to solve this: Let best[T][D][U] be the correct answer for the interval [0,1], T tests, at most D destructions and U is 1 if we know that the container breaks at pressure 1. Clearly the solution to the original problem is the value 100*best[numTests][numDestroyed][0]. We will compute the values best[T][D][U] using dynamic programming, or equivalently, using memoized recursion.

We already described how to split best[T][D][U] into two subproblems. The first of them will correspond to best[T-1][D-1][1], the second will correspond to best[T-1][D][U]. For each subproblem we can compute the answer as a function of X – the place of the query.

We now have to find an optimal value of X. We want to minimize the larger of the two answers for subproblems. Clearly this happens if they are both equal. This leads to a linear equation for X. We compute the optimal X, and thus the answer to our original problem.

C++ code follows:
double memo[52][52][2];

double solve(int numTests, int numDestroyed, int knowsUpper) {
  double &res = memo[numTests][numDestroyed][knowsUpper];
  if (res >= 0.0) return res;
  res = 1.0;

  // handle the end of the computation
  if (numDestroyed == 0) return res = 0.5;
  if (numTests == 1 && !knowsUpper) return res = 0.5;
  if (numTests == 1 && knowsUpper) return res = 0.25;
  
  // handle the general case: solve two subproblems
  double prec1 = solve(numTests-1,numDestroyed-1,1);
  double prec2 = solve(numTests-1,numDestroyed,knowsUpper);

  // now solve the equation: prec1*x == prec2*(1-x)
  double x = prec2 / (prec1+prec2);
  return res = prec1 * x;
}

class Destruction {
public:
  double minError(int numTests, int numDestroyed) {
    for (int i=0; i<52; i++) 
      for (int j=0; j<52; j++)    
        for (int k=0; i<2; k++)
          memo[i][j][k] = -1.0;
    return 100.0*solve(numTests,numDestroyed,0);
  }
};
There is a solution that is simpler to implement. All we have to do is to convince ourselves that the optimal solution has the following property: Regardless of the outcomes of the probes, the answer will always be exactly the same.

(The reasoning behind this observation is similar to the reasoning we used to find the optimal X – if we were able to achieve a better precision in some cases, we could shift the probes slightly to "distribute" this gain and thus decrease all largest possible answers. Alternatively, we already have a proof of this observation: our DP solution above computes an optimal solution, and it has this property.)

One fixed experiment result can be described as a string of 'Y's and 'N's, where the i-th character corresponds to the outcome of our i-th probe – 'Y' means that the container broke, 'N' that it didn't. For each such string S we can determine an interval IS = [loS,hiS] such that the exact answer lies within IS. Clearly the precision of the result after an experiment corresponding to S is the length of IS divided by 2.

Suppose we use an optimal approach. Write down all strings that correspond to possible experiments, and are maximal (i.e., no more tests are possible in the situation they describe). If we now take a look at their corresponding intervals, the optimality of our approach enforces two properties:
– each two intervals are disjoint (except maybe for their endpoints)
– all the intervals have the same length, and the answer is a half of this length

Now we can compute the answer without knowing the optimal solution. All we have to do is to count the valid strings, the answer we seek is 50.0 divided by their count.

To count the maximal strings, we will use one simple trick. Some of the strings have length equal to numTests, but some are shorter. Those correspond to situations when we ran out of containers to break. Now we can take each such string and append 'N's until its length reaches numTests.

Clearly this is a bijection. Note that if you have the modified string, you can reconstruct the original string in the following way: Count the 'Y's. If their count is less than numDestroyed, this is the original string. If it is equal, the original string ends with the last 'Y'.

Thus we get the following equation:
the count of maximal strings = the count of strings of length numTests that contain at most numDestroyed 'Y's

One small twist and we are done. The twist is that in our problem the string "NNN...N" is special. It corresponds to the case "the answer exceeds 100.0", and thus the corresponding interval has length zero.

Thus the number of interesting strings is ∑i=1numDestroyed choose(numTests,i), where choose(N,K) is the binomial coefficient.

Author
By misof
TopCoder Member