Match Editorial

MATCH EDITORIAL LINKS:

Room 1 Review	Archive
Rookie Review	Discuss this match
Problem Set	Want to write a feature?
Lessons Learned

SRM 89
May 18, 2002

Match summary
Single Round Match 89 offered a nice diversity of problems, ranging from some simple mathematics to digital picture processing techniques to some advanced graph theory. Thus there is much to be learned from this match.

Division 1 coders faired rather poorly, most likely because of misleading point values and difficult problems.

The Problems

Div. 2, Level 1: Average

The solution to Average was exactly as it seemed: compute the average, and count the number of children whose total scores were below average. Common mistakes on this problem were:

integer division when computing the average
casting the average to an integer

Whenever using the / operator, keep in mind that it will be integer division if both of its operands are of integral types (even when assigning to a float or double. Thus, to force floating point division, cast at least one of the arguments to a floating point type.

Div. 1, Level 1 / Div. 2, Level 2: Powerful

The definition of Powerful was to find the largest integer k > 1 such that b^k = n for some integer b and the given integer n. The main trick to this problem was reducing the search space, and then performing some sort of search in the reduced search space. One method was to observe that, since b > 1 must hold, the maximum value of k must be log₂ n. Given the upper limit of n, we know that k <= 30. Thus one only had to iterate k from 2 to 30 inclusive, and obtain the largest k for which n^{1 / k} is an integer (if any). The main problem with this method was determining b from k. Mathematically this works out to b = n^{1 / k}, which is difficult to compute accurately due to the inability to represent 1 / k accurately in floating point for most values of k.

Another method was to observe that, since k > 2 must hold, the maximum value of b is sqrt(n). This gives an upper bound of 14142. For each b between 1 and 14142 inclusive, iterate through the powers k of b until b^k >= n. The first b, k that we find that satisfies b^k = n (if we iterate through values of b in ascending order) will be our answer.

The most common problems were fencepost errors (upper bounds off by 1) and floating point instability (particularly the problem related to the first method described above).

Div. 2, Level 3: Filter

The Filter problem in Division 2 is an interesting problem in that it represents a common digital picture processing operation called opening. The open operation takes an input image and a structuring element, which defines the effect of the opening. The opening operator is defined as an erosion of an image using the structuring element followed by dilation using the same structuring element. Follow the links given for good definitions, discussions, and examples of these operations.

The Filter problem is equivalent to opening the input image with the following structuring element (with origin at the center):

0	1	0
1	1	1
0	1	0

The easiest method would be to simply implement the erosion and dilation operators for the structuring element above. Each could be implemented as a local operator -- that is, a function that, given an image and a location in that image, returns a pixel value to be placed in the same location in a new image. The erosion and dilation operators are then:

/* a representation of which pixels are turned on in our structuring element */
int[][] offsets = { { 0, 0 }, { 0, 1 }, { 0, -1 }, { 1, 0 }, { -1, 0 } };

/* pixels outside the image are off */
boolean get_pixel(boolean[][] image, int r, int c)
{
    return r >= 0 && c >=0 && r < image.length && c < image[0].length && image[r][c];
}

boolean erosion_operator(boolean[][] image, int r, int c)
{
    for(int i = 0; i < offsets.length; i++)
        if(!get_pixel(image, r + offsets[i][0], c + offsets[i][1]))
            return false;
    return true;
}

boolean dilation_operator(boolean[][] image, int r, int c)
{
    for(int i = 0; i < offsets.length; i++)
        if(get_pixel(image, r + offsets[i][0], c + offsets[i][1]))
            return true;
    return false;
}

In general solutions to this problem could be seen as some sort of variant of an opening.

Div. 1, Level 2: HexGolf

The HexGolf problem was to find the minimum number of strokes it would take to move a ball to a particular position in a hexagonal grid. For each stroke one can choose from a set of clubs, each of which moves the ball a certain distance in any direction parallel to one of the three axes of the hexagonal grid. We are given an upper bound of 5 strokes; if the hole cannot be reached in 5 strokes or less, -1 is returned.

Algorithm

The solution consists of performing a breadth-first search. This consists of maintaining a frontier of all the possible locations the ball can be at after k strokes. Initially the ball is somewhere in the tee area. For each stroke, we generate a new frontier from the previous one. For each location in the frontier, we iterate through each club and each direction and compute the location the ball would be at if we struck it with that club in that direction. This new location then goes into the new frontier. If the new location is where the hole is, we return the number of strokes it took to get there. Since this is a breadth-first search, the first solution we find will by definition be the one of minimum strokes.

There are at most 5 clubs, and exactly six directions. Thus for each location in an old frontier we will generate at most ⁶ⁿ new locations for n clubs. If there are x locations in the old frontier, we will obtain at most 6nx locations in the new frontier.

Since the complexity of our solution is exponentially proportional to the size of the initial frontier, we want it to be 1. However, we are given as input an area about the origin in which the ball may start. If we were to use this entire area as the initial frontier, our solution would fail to finish in time (and would consume too much memory). The upper bound on teeSize is 10000, and 60000⁵ happens to be a very large number.

The trick, instead, is to think of the effect of translating the ball to some location within the tee area from the origin. We can think of our solution as generating a set of paths (of length less than or equal to 5) through the hexagonal grid. Suppose we have a set of paths originating at the origin (that is, our BFS computed with the origin as the only element of the initial frontier). If we were to translate the ball along the east and southeast axes, the effect would be to translate each point in each path in the exact same manner.

Therefore we can see that there is no point in adding all the points within the tee area to our initial frontier. Instead, all we have to do is translate our tee area so that it is centered at each location we reach, and see if the hole lies within the tee area. In other words, for each location we reach, instead of checking if it is the same location as the hole, we check if it is within a distance of teeArea from the hole.

So, we begin with a frontier of size 1 and build at most 5 frontiers. Therefore, in the worst-case scenario, we will visit (6n)⁵ = 7776n⁵ locations. For n = 5, this is 7776 * 3125 = 2425000, meaning that this solution is tractable given its constraints.

Implementation

The only real difficulty in implementing the above algorithm is working with the hexagonal coordinate system. For representing locations, we will use the same method as that of representing the hole location: a coordinate pair (e, se) representing a distance of e travelled east followed by a distance se travelled southeast. Let the location of the hole be (e₀, se₀). We need a way of computing the minimum radius of a hexagon centered at (e₀, se₀) that contains (e, se). To do this, we first find (e', se') = (e, se) - (e₀, se₀). If e' and se' have the same sign, the minimum radius is |e' + se'|. Otherwise, the minimum radius is max(|e'|, |se'|). If this minimum radius is less than or equal to teeArea, we have hit the hole.

Most people in fact implemented a depth-first search rather than a breadth-first search, as it is easier to implement and requires much less memory. For this problem, a DFS is equivalent to an exhaustive BFS, so its runtime complexity is equivalent to that of the worst-case BFS. Since the DFS implementation requires no maintenance of a queue, it is obviously the fastest solution to implement.

Division 1, Level 3: Buddy

Despite its modest value (900 points), this was easily one of the most difficult problems ever posed in a TopCoder competition. This problem can be reduced to that of finding the minimum-weight bipartite matching (if any).

Since I was unable to solve this problem myself, I looked to the few submissions that passed the system tests to see how others approached the problem. There seemed to be two types of solutions: either a depth-first search or a graph theoretic solution.

Depth-First Search

Algorithm

The first step is to precompute the number of row-pairings one can get for any row-pairing configuration. A row-pairing configuration can be represented as a binary string, such as "1100" for a row of length 4. A 1 bit represents a child that is paired with someone in the same row, while a 0 bit represents either an adult or a child paired with someone in the same column. We mark as invalid configurations that do not describe a valid row-pairing (e.g. "1101" is invalid because the right-most child is not actually paired with anyone); otherwise we count the number of pairs in the configuration (e.g. "1100", "0110", and "0011" each represent 1 pair, and "1111" represents two pairs.

Now we can recursively generate all valid pairings in the square of kids, subject to the constraints of the problem. We do this by working from the bottom row up (or the top row down). For each row, we have a mask which consists of the locations occupied by adults or by children paired with children in the previous row. We then iterate through each valid row-pairing configuration. If the configuration does not contain any pairs that occupy the same locations as adults or children paired with someone in the previous row, then we generate a new mask and recursively move on to the next row. If we visit all the rows and end up with an empty mask, then we have a valid pairing for all the kids. We can count the number of row-pairs as we go along, and then remember the maximum value found when a valid pairing was reached. Thus our counting function looks something like:

int numpairs[1 << n];
int adults[n];
int best = -1;

void count(int row, int mask, int running_total)
{
    if(row == -1) {
        if(mask == 0 && running_total > best)
            best = running_total;
        return;
    }
    for(int i = 0; i < (1 << n); i++)
        if(numpairs[i] != INVALID && fits(i, mask | adults[row]))
            count(row - 1, (1 << n) - 1 - (i | mask | adults[row]));
}

count(n - 1, 0, 0);
return best;

Keep in mind that x << y is equivalent to 2^yx.

Implementation

Yarin's solution is an excellent implementation of this algorithm, using bitwise operators to handle all the configuration representations and masks. There are 2ⁿ row-pairing configurations (many of which are invalid). These can be represented as an array indexed by an integer whose binary representation is also that of the configuration. The values stored in the array are then the number of row-pairs in each configuration (or some special value to represent invalid configurations).

The binary representation of an integer can also represent a parent mask or the mask derived from a previous row's configuration. If applying the bitwise and operation to a row-configuration and either of these masks results in a non-zero value, then that particular row-configuration does not fit the constraints. Otherwise the mask for the next row can be obtained by taking applying a bitwise or operation to the row configuration and the two masks and computing the complement of that value (i.e., ~(rowconfig | parentmask[row] | mask)). We add to our running sum the number of pairs defined by the row configuration, and recurse to the next row. If there is no next row, and our new mask is 0, we remember our sum of row-pairs if it is greater than the best value obtained so far.

For further optimization, we need to employ memoization techniques for our recursive function. Since it takes as parameters 0 <= row < n and 0 <= mask < 2ⁿ, we can create an array of dimension n x 2ⁿ for remembering values.

Graph Theoretic

The group of children can be represented as a graph, where each child is a vertex and there is an edge between each pair of adjacent children. It turns out that this graph is a bipartite graph. A bipartite graph consists of two sets of vertices (we will call them V₁ and V₂), such that within each set there exists no adjacent vertices. If we were to place the children onto a chessboard, V₁ would consist of children standing on white squares and V₂ would consist of children standing on black squares.

The definition of the problem gives that each child (vertex) should have at most one partner (edge) (to be valid, every child should have exactly one partner, but we will get to that shortly). In graph theoretic terms, this is what is known as a matching, which is a set of edges such that no vertex included more than once. It follows then that any pairing of children (even if incomplete) is a bipartite matching.

A maximum bipartite matching is the matching of maximum cardinality of a bipartite graph. If the maximum bipartite matching gives (n² - adults) / 2 edges, then there exists a way to pair up all of the children. Although simply finding a maximal bipartite matching is insufficient to solve this problem, it might be helpful to demonstrate how one might be found.

The maximum bipartite matching problem can be reduced to the network flow problem. Then attach a source vertex to all of the vertices in V₁ and a sink vertex to all of the vertices in V₂. Give each edge a capacity of 1. The Ford-Fulkerson algorithm or some other flow maximization algorithm can then be employed to obtain a maximum flow. The edges utilized in the maximum flow then constitute the maximum matching.

However, we cannot stop at simply obtaining a maximal bipartite matching. We need the matching that gives the most horizontal pairings. To obtain this, we must assign weights to edges in the bipartite graph. Unfortunately, once we do this, we can no longer use the network flow algorithm to solve the problem. For each edge that is horizontal (between children on the same row), we assign a weight of 1. Otherwise we assign a weight of 0. The weight of the bipartite matching with the maximum cumulative weight then gives us the answer.

The solution we need for this particular problem is the solution to the maximum weighted bipartite matching problem, also known as the assignment problem. One algorithm that solves this problem is known as the Kuhn-Munkres algorithm (also described here).

By Logan
TopCoder Member

Member Comments

Just wanted to comment that the SRM 89 Problem Set Analysis was fantastic! Very detailed, and a great analysis of the correct algorithms involved. Keep up the good work!

BlueTDimly