JOIN
Get Time
statistics_w  Match Editorial
2003 TopCoder Open
Online Round 4

Wednesday, November 5, 2003

Summary

Call it nerves. Call it writers pulling out all the stops. Call it Round 4 of the TopCoder Open! Faced with an Easy problem that would have been a Medium on any other night, and a Medium problem that would have been a Hard on any other night, 50 of TopCoder's top coders managed 37 correct submissions. That's 37 total. On all three problems combined.

tjq was first on the scoreboard with a quick submission on the Easy. But by about the 12 minute mark, Easy submissions were rolling in. The coders quickly turned to the Medium and then...nothing. Silence. Not even any hopeful compiles. Eventually, many programmers began to abandon the Medium and turn to the Hard. Finally, 39 minutes into the coding phase, SnapDragon grabbed the top spot with the first submission on the Medium. Two minutes later tomek came out of nowhere to steal the lead. tomek had struggled with the Easy, taking 24 minutes to complete it. But then he flew through the Medium in an amazing 17 minutes to gain back all the points he had lost. And so, with 34 minutes of coding still on the clock, the contest was decided, as not a single Hard submission would later system tests. Congratulations to tomek, SnapDragon, and the other 14 advancers!

The Problems

WhichData discuss it
Used as: Division One - Level One:
Value 300
Submission Rate 45 / 50 (90.00%)
Success Rate 32 / 45 (71.11%)
High Score tjq for 282.05 points (7 mins 15 secs)
Average Score 202.98 (for 32 correct submissions)

In this problem, you enumerate all possible nonempty subsets of the data and calculate the variance of each, keeping the subset whose variance is closest to the target.

There are two easy approaches to generating the subsets: bit masks and recursion. To iterate through the subsets using bit masks, you count from 1 to 2^N-1, where N is the number of elements in the data. Then you treat each index as an N-bit number. If the i-th bit of this number is 1, then element i of the data is in the subset. If the i-th bit is 0, then element i is not in the subset. Usually in these kinds of loops, you count from 0 to 2^N-1, but this time you start at 1 to avoid the empty set.

To enumerate subsets using recursion, you consider the elements one at a time, trying all the subsets with that element and then trying all the subsets without that element, as shown in the following pseudocode:

  enumerateSubsets(i, currentSet)
    if i == N then return
    otherwise,
      process variance of currentSet + element i
      enumerateSubsets(i+1, currentSet + element i)
      enumerateSubsets(i+1, currentSet)
Most coders went for the bitmask solution, but the recursive algorithm had one compelling advantage for this problem. Assuming you sort the sample data first, then it is trivial to make the recursive solution generate the subsets in lexicographic order. In fact, the pseudocode above does exactly that! With the bitmask version, checking lexicographic order to break ties is somewhat tricky.

The main remaining issue is precision. If you had a prewritten Fraction class, this would have been a good time to bring it out. Otherwise, you probably used doubles. When comparing two doubles, you had to treat them as equal if the difference between them was less than 10^-9. And if they were equal, then the tie-breaker rules involving lexicographic order apply.

There was a minor but useful trick that could be used to avoid recomputing the variance of each subset from scratch. The formula for the variance was given as

     mean = ( s(0) + s(1) + s(2) + ... + s(n-1) )/n
     varsum = (mean-s(0))^2 + (mean-s(1))^2 + ... + (mean-s(n-1))^2
     variance = varsum/n
At first glance, it looks like you can't compute varsum until you know the mean, and you can't compute the mean until you know the size of the subset. But note that
     varsum = (mean-s(0))^2 + (mean-s(1))^2 + ... + (mean-s(n-1))^2
            = n*mean^2 - 2*mean*sum + sumOfSquares
where
     sum           = s(0) + s(1) + ... + s(n-1)
     sumsOfSquares = s(0)^2 + s(1)^2 + ... + s(n-1)^2
Therefore,
     varsum   = n*mean^2 - 2*mean*sum + sumOfSquares
              = n*(sum/n)^2 - 2*(sum/n)*sum + sumOfSquares
              = sum^2/n - 2*sum^2/n + sumOfSquares
              = sumOfSquares - sum^2/n
and
     variance = varsum/n
              = sumOfSquares/n - sum^2/n^2
Therefore, to calculate the variance, you only need to keep track of the sum of the elements, the sum of the squares of the elements, and the number of elements. This fact is particularly handy in the recursive formulation, where you use the partial sums to calculate the variance of one sample and then pass the partial sums on to the recursive calls, instead of recalculating the variance from scratch each time you want to process a new set.

For an example of the bitmask approach, see the solutions of practically anybody that passed. For an example of the recursive approach, see my practice room solution.

Jewelry discuss it
Used as: Division One - Level Two:
Value 500
Submission Rate 7 / 50 (14.00%)
Success Rate 5 / 7 (71.43%)
High Score tomek for 378.97 points (17 mins 15 secs)
Average Score 286.10 (for 5 correct submissions)

At first glance, this problem appears to be a straightforward variation on the knapsack theme. As lars announced in his trademark fashion, "It's TRIVIAL!" You would expect a crowd as well-versed in dynamic programming as this one to breeze through the problem. Ah, but there was a twist! The treatment of equal elements gave folks fits.

Without equal elements, the problem could be solved as follows:

  sort the elements in increasing order
  count = 0
  for each i from 1 .. # of elements do
    // assume element i is Bob's highest valued item
    waysBelow = ways to make sums using elements below i
    waysAbove = ways to make sums using elements above i
    for each sum s from element i upto max do
      count += waysBelow[s - element i] * waysAbove[s]
  return count
Given K elements 1..K, you can calculate the number of ways to make various sums of those values using a typical knapsack algorithm:
  initialize array ways[0..Max] to all zeros
  ways[0] = 1
  for each i from 1 upto K do
    for each sum s from max downto element i do
      ways[s] += ways[s - element i]
At the end of these loops, ways[s] contains the number of ways to choose elements that sum to s.

Equal elements complicate the picture because we can no longer guarantee that Bob's elements are all to the left of Frank's elements in the sorted list. However, the only exceptions occur when Bob and Frank take elements of equal value. In each such case, we need to consider all the different ways that Bob and Frank can exchange their equal elements. Fortunately, the changes to the overall algorithm are relatively minor.

We now need to process the elements group by group instead of element by element, where each group contains equal elements. Suppose there are G equal elements in a group. For each size g between 1 and G, we calculate how many distributions there are in which Bob gets g items from this group. Then we multiply by Choose(G,g) to account for the different ways to pick g elements out of the group. Altogether, the final algorithm looks like

  sort the elements in increasing order
  count = 0
  for each group of equal elements do
    i = first index of group
    G = size of group
    waysBelow = ways to make sums using elements below i
    for each g from 1 to G do
      waysAbove = ways to make sums using elements above i+g-1
      for each sum s from g * element i upto max do
        count += Choose(G,g) * waysBelow[s - g * element i] * waysAbove[s]
  return count

Clue discuss it
Used as: Division One - Level Three:
Value 1000
Submission Rate 6 / 50 (12.00%)
Success Rate 0 / 6 (0.00%)
High Score n/a
Average Score No correct submissions

"No correct submissions." That's not something you see very often in TopCoder, especially in a match involving coders like SnapDragon, tomek, and snewman. And you wouldn't have seen it in this match either if folks hadn't spent so much time working on the previous problem.

There are two basic approaches you can take to this problem. First, you can try to use logical reasoning and trace through the implications of each guess. I tried this, but 2 1/2; hours later, I was still hacking away, with no end in sight. I'm eager to see somebody submit a complete deductive solution in the practice room!

Or you can try brute force. Check all possible combinations of cards and rule out any combinations that are inconsistent with one or more of the guesses. First, you have to convince yourself that this is feasible. There are at most 6*6*9 = 324 ways to assign the secret cards and Choose(12,6) = 12!/6!6! = 924 ways to assign the remaining cards. So there are at most 324 * 924 ~ 300000 ways to deal the cards. For each deal, you have to check at most 50 guesses for consistency, so you have to check at most 300000 * 50 = 15 million guesses. This is just barely possible within the 8 second time limit, but only if you use a very efficient representation of guesses, such as bitmasks. If you use something like strings to represent the guesses, you're doomed. However, you don't really need to check all 6*6*9 ways to assign the secret cards, because you can immediately rule out any combination that includes one or more cards from player 1's hand. That leaves less than 6 million guesses to check, which is easily possible using bitmasks.

Many other optimizations are conceivable. Of these, the most useful is to skip any combination of three secret cards in which the three cards have already been determined to be part of the result as parts of other combinations. For example, if you already know that the secret cards could have been S1-W2-L3 or S3-W2-L1, then you don't need to test whether the secret cards could have been S3-W2-L3 because those cards will all be in the final result even if that particular combination is not allowed.

To check that a particular guess is consistent with a deal, you need to check four things:

  • If responder is 0, make sure the two non-guesser players don't have one of the guessed cards.
  • If responder is the player on guesser's right, make sure the player on guesser's left doesn't have one of the guessed cards.
  • If responder is not 0, make sure responder actually has one of the guessed cards.
  • If the response is not "N0", make sure the responder actually has the card that was shown.
If all of these tests are satisfied, then the guess is consistent with the deal. All four of these tests are easy to code using bitmasks—see my solution in the practice room.

Author
By vorthys
TopCoder Member