Match Editorial

SRM 427
Tuesday, December 2, 2008

Match summary

This SRM featured the 4th problemset written by Vasyl[alphacom], this time all problems were about strings. In Div-1 coders were proposed a straightforward easy, a strong medium and a very tough hard problems. However, the number of really hard cases in the Hard was small and it was possible to precompute the answers for them. Among those who went this way, 4 coders were able to solve the Hard problem, what allowed them to take first 4 places even despite the fact that two of them didn't solve the Medium. Match victory went to ACRush who solved all 3 problems and made the fastest submission on 1000-pointer overall. The second place was taken by bmerry, who also solved all problems. Soultaker rounded up the Top-3 and Psyho took 4th place, they both didn't even submit the medium problem.

In Div-II the set was significantly easier. The first three places were taken by rchard, armansuleimenov and agsharath, solving all 3 problems. sky58, author of the fastest 1000-pointer, finally occupied only 16th position, because his submission on the medium was successfully challenged.

The Problems

ThePalindrome

Used as: Division Two - Level One:

Value 250

Submission Rate 630 / 758 (83.11%)

Success Rate 453 / 630 (71.90%)

High Score mbr1 for 247.63 points (2 mins 47 secs)

Average Score 187.26 (for 453 correct submissions)

First, note that the answer in this problem is not large. If we append to s all its characters in reverse order, the resulted string will be a palindrome, so the answer does not exceed twice the length of s. Therefore, for each i between 0 and length of s we need to answer the following question: does it possible to append i characters to the end of s to obtain a palindrome? The answer for our problem is simply (the smallest among appropriate i's) + (the length of s).

If want to append i characters to make a palindrome, there's actually no any choice in what exactly characters to append. The last appended character must be the same as the first character of s (otherwise it won't be a palindrome for sure), the last but one appended character must be the same as the second character of s, and so on. In other words, we should append the first i characters of s in reverse order. Of course, after these characters are added, the result is still not guaranteed to be a palindrome, so this needs to be additionally checked.

This approach can be implemented in Java as follows:


public class ThePalindrome {

  boolean isPalin(String s) {

    for (int i=0; i<s.length(); i++)

      if (s.charAt(i) != s.charAt(s.length()-i-1))

        return false;

    return true;

  }



  public int find(String s) {

    for (int i=0; ; i++) {

      String tmp=s;

      for (int j=i-1; j>=0; j--)

        tmp += s.charAt(j);

      if (isPalin(tmp))

        return i + s.length();

    }

  }

}

Excercise (not of DII-Easy level). The described solution of the problem has a complexity of O(n^2) and works very fast when n ≤ 50. Find a solution for this problem that will work for n ≤ 100,000 (it should have a complexity of O(n * log n) or even O(n)).

TheLuckyString

Used as: Division Two - Level Two:

Value 500

Submission Rate 314 / 758 (41.42%)

Success Rate 200 / 314 (63.69%)

High Score samsam for 496.57 points (2 mins 21 secs)

Average Score 381.42 (for 200 correct submissions)

Used as: Division One - Level One:

Value 250

Submission Rate 586 / 624 (93.91%)

Success Rate 535 / 586 (91.30%)

High Score JongMan for 249.56 points (1 mins 11 secs)

Average Score 218.62 (for 535 correct submissions)

Since s contains at most 10 characters, there are at most 10!=3,628,800 ways to permute its letters. This is not too much, so nothing prevents us from checking all strings that can be obtained by permuting the characters of s and calculate how many of them are lucky. The most natural way to implement this is using recursion. Our recursive method will try to assign characters of s to positions 0, 1, ..., L-1, in order (where L is the length of s), in all possible ways, and so that they form a lucky string. It takes two parameters - integer pos is a position we are currently filling and character prev is the character we've put onto the previous position. There's also a global array have of 26 integers, where have[i] is how many i-th (in alphabet) letters we currently have. For each letter c in alphabet such that we have this letter and it is different than prev, we try to put this letter on position pos (we can't put the letter prev on position pos because we want to generate only lucky strings). To do this, we decrease the number of occurrences of the letter c in have by 1, recursively call the method with parameters pos + 1 and c, and than restore the state of have by increasing the number of occurrences for c by 1. If in some call pos = L, it means that we've just generated one more lucky string, so we increase some global counter cnt by 1. The search is initiated by a call with parameters pos=0 and prev=' ' (or some other character that is not a letter - to indicate that every character can be put at position 0). The answer is the value of counter cnt after the method has completed its work.

The actual code appears to be shorter than its explanation given in the previous paragraph. Java implementation follows.


public class TheLuckyString {

  int[] have = new int[26];

  int cnt = 0, L;



  void solve(int pos, char prev) {

    if (pos == L) {

      cnt++;

      return;

    }

    for (char c = 'a'; c <= 'z'; c++)

      if (prev != c && have[c - 'a'] > 0) {

        have[c - 'a']--;

        solve(pos + 1, c);

        have[c - 'a']++;

      }

  }



  public int count(String s) {

    L = s.length();

    for (int i=0; i < L; i++)

      have[s.charAt(i) - 'a']++;

    solve(0, ' ');

    return cnt;

  }

}

Excercise. The described solution works in O(L!). Can you come with something significantly faster? What about polynomial solution? (Hint: this problem is very similar to a DI-Hard problem from a recent SRM; you can just adapt solutions for that problem to get fast solutions for this one).

TheDictionary

Used as: Division Two - Level Three:

Value 1000

Submission Rate 115 / 758 (15.17%)

Success Rate 22 / 115 (19.13%)

High Score sky58 for 760.85 points (17 mins 5 secs)

Average Score 532.96 (for 22 correct submissions)

The first question we should answer when solving this problem is how many strings overall are in the dictionary made by Brus? To generate any such string, we must first choose n positions for letters 'a' among n+m available positions, and than to put m letters 'z' onto m positions that are not chosen. So the number of possible strings is the same as the number of ways to choose positions for letters 'a', which is clearly equal to C(n+m, n) = (n + m)! / (n! * m!). If k > C(n+m, n), we can simply return an empty string.

Otherwise, let's break all strings into two groups - those that start from 'a' and those that start from 'z'. Obviously, all strings in the first group are alphabetically smaller than strings from the second group. If we ignore the first character (which is anyway fixed within each group), then each string in the first group contains n-1 'a's and m 'z's, therefore it contains overall C1 = C(n+m-1, n-1) strings. Using this number, we can determine the first character of the k-th string in the dictionary. If k ≤ C1, this string falls within the first group and therefore starts from 'a'. Otherwise, if k > C1, the string falls within the second group and starts from 'z'. We can also determine the index of the k-th string within its group. If it's in the first group, this index is still k, otherwise the index is k-C1 (we must subtract all strings from the first group).

Note that if we ignore the first character in each group, the rest will look exactly like the initial dictionary, but with different parameters n and m. The first group is a dictionary for n' = n-1 and m' = m, and the second group is a dictionary for n' = n and m' = m - 1. Since we know the index of our string within its group and the group is again a dictionary, we can apply the same argument from the previous paragraph to determine the second character of the string. Repeating the process n + m times will allow us to get all the characters.

In order to calculate the values of C it's useful to apply the recurrence C(a, b) = C(a-1, b) + C(a-1, b-1). There's also the following difficulty - the values of C can be very large, for example C(200, 100) is a number containing 59 decimal digits. However, the only operations we need to perform with the values of C is to compare them with k and to subtract them from k in cases when k is larger. From this point of view, there's no any difference between the number containing 59 decimal digits and, say, number 1,000,000,001 (because the values of k are always no more than 1,000,000,000). So to get rid of this difficulty, we can modify the recurrence as follows: C(a, b) = min(C(a-1, b) + C(a-1, b-1), 1,000,000,001).

Java implementation of this approach is given below.


public class TheDictionary {

  public String find(int n, int m, int k) {

    int[][] C = new int[201][201];

    C[0][0] = 1;

    for (int i=1; i<=200; i++) {

      C[i][0] = 1;

      C[i][i] = 1;

      for (int j=1; j<i; j++)

        C[i][j] = Math.min(C[i-1][j] + C[i-1][j-1], 1000000001);

    }



    if (C[n+m][m] < k) return "";



    String s = "";

    int L = n + m;

    for (int i=0; i<L; i++) {

      if (n>0 && C[n+m-1][m] >= k) {

        s += "a";

        n--;

      } else {

        s += "z";

        k -= C[n+m-1][m];

        m--;

      }

    }



    return s;

  }

}

TheLongPalindrome

Used as: Division One - Level Two:

Value 500

Submission Rate 107 / 624 (17.15%)

Success Rate 74 / 107 (69.16%)

High Score Petr for 413.22 points (13 mins 37 secs)

Average Score 253.18 (for 74 correct submissions)

Let's denote the number of strings (not necessarily palindromes) of length L containing at most C distinct characters as F(L, C), and the number of palindromes of length L containing at most k distinct characters as G(L). We need to find S = G(1) + G(2) + ... + G(n)). Note that any palindrome of even length L is uniquely determined by its first L/2 characters (the rest is just the inverse of these characters), so for even L we have G(L) = F(L/2, k). Similarly, for odd L any palindrome is determined by its first (L+1)/2 characters, so G(L) = (F(L+1)/2, k) for odd L. If we apply these formulas to the expression for S, we get S = 2 * (F(1, k) + F(2, k) + ... + F(n/2, k)) for even n and S = 2 * (F(1, k) + F(2, k) + ... + F((n-1)/2, k)) + F((n+1)/2, k).

Let's define FF(L, C) as the number of strings of length L containing exactly C distinct characters. It's easy to see that F(L, C) = FF(L, 1) + FF(L, 2) + ... + FF(L, C). Our goal is to find a recurrence for FF. Note that there are two ways to get a string of length L with exactly C distinct characters. We can take a string of length L-1 containing exactly C distinct characters (there are FF(L-1, C) such strings) and append any of its C characters to it. Alternatively, we can take a string of length L-1 containing exactly C-1 distinct characters (there are FF(L-1, C-1) such strings) and append to this string any of 26 - (C - 1) = 27 - C characters, that are not in it. So we get the following recurrence: FF(L, C) = C * FF(L-1, C) + (27 - C) * FF(L-1, C-1).

Using this recurrence, we can calculate all values of FF, than all values of F and than, finally, S. It sounds good, but, unfortunately, is too slow. To speed things up, let's use matrices. Introduce the following sequence of vectors V:


    V(L) = [FF(L, 0) FF(L, 1) ... FF(L, 25) FF(L, 26)]

Now let's build a 27x27 matrix A such that V(L) = V(L-1) * A. This equation means that FF(L, C) is obtained as a scalar product of V(L-1) and C-th (0-based) column of A. Therefore, C-th column of A must contain exactly two non-zero elements: A_{C-1, C} = 27 - C and A_{C, C} = C. Overall, matrix A must look like this:


        [ 0  26 0  0  ... 0  0  0  ]

        [ 0  1  25 0  ... 0  0  0  ]

        [ 0  0  2  24 ... 0  0  0  ]

        [ 0  0  0  3  ... 0  0  0  ]

    A = [ ........................ ]

        [ ........................ ]

        [ ........................ ]

        [ 0  0  0  0  ... 24 2  0  ]

        [ 0  0  0  0  ... 0  25 1  ]

        [ 0  0  0  0  ... 0  0  26 ]

Using the equation V(L) = V(L-1) * A several times, we get V(L) = V(L-1) * A = (V(L-2) * A) * A = V(L-2) * A^2 = ... = V(0) * A^L, where V(0) = [1 0 0 ... 0 0]. Remember that for even n's we have S = 2 * (F(1, k) + F(2, k) + ... + F(n/2, k)). As F(L, k) is the sum of first k+1 elements of V(L), to calculate S it's enough to know the sum V(1) + V(2) + ... + V(n/2) = V(0) * (A^1 + A^2 + ... + A^{n/2}). Similarly, for odd n we need to know the sum V(0) * (A^1 + A^2 + ... + A^{(n-1)/2}) and the value V(0) * A^{(n+1)/2}.

So, in order to solve the problem we now need only to be able to calculate A^p and A^1+A^2+...+A^p fast enough, where p is an arbitrary integer. Both subproblems are kind of classic and occur time to time in SRMs. Directly using properties A^{2q} = (A^q)^2 and A^{2q+1} = A^{2q} * A depending on parity of p, we can implement recursive function that calculates A^p using O(log p) matrix multiplications. Similarly, using properties A^1+A^2+...+A^{2q} = (E+A^{q})*(A^1+A^2+...+A^{q}) and A^1+A^2+...+A^{2q+1} = (A^1+A^2+...+A^{2q}) + A^{2q+1}, we can calculate A^1+A^2+...+A^p using O(log^2 p) matrix multiplications.

Check Petr's fastest submission on this problem for a clean implementation of this approach.

Exercises

Describe how one can calculate the sum A^1 + A^2 + ... + A^p using only O(log p) matrix multiplications, thus getting an O(k^3 * log p) solution on this problem.
Find a solution for this problem that has an assymptotical complexity of O(k^2 * log p).

TheStringGame

Used as: Division One - Level Three:

Value 1000

Submission Rate 14 / 624 (2.24%)

Success Rate 4 / 14 (28.57%)

High Score ACRush for 592.46 points (28 mins 0 secs)

Average Score 522.75 (for 4 correct submissions)

This problem looks like a usual 2 players combinatorial game. If you have never solved similar problems, please first read the following tutorial. We can start by implementing a recursive memoized function that calculates the outcome for every possible string than can occur in the game. The pseudocode for this function looks as follows:


Outcome solve(String s)

  If solve was already called for s

    Then just return the outcome from memo



  Generate the list M of possible moves from s



  If M is empty

    Result := (Draw)

  Otherwise 

    Create empty list of outcomes L

    For each string ss in M

      Add solve(ss) into L



    If there is at least one defeat in L

    Then

      Let X be the number of moves in the fastest of all defeats in L

      Result := (Victory in X+1 moves)

    Otherwise

      If there is at least one draw in L

        Result := (Draw)

      Otherwise

        Let X be the number of moves in the slowest of all victories in L

        Result := (Defeat in X+1 moves)



  Add Result to memo for string s

  Return Result

End

It's not hard to implement this pseudocode in your favorite programming language, but there is one problem. The number of reachable game states from a starting position is 3^N, where N is the number of 'X' characters in the input string. This is because each 'X' character can be further replaced by 'O' or 'L', so there are 3 possibilities for each 'X' position. In the worst possible case it's 3^16 = 43,046,721, what is too many. During the match, all 4 coders who solved this problem used the following workaround. If the number of 'X's doesn't exceed 14, the number of states is at most 3^14 = 4,782,969 and well written C++ solution can solve these cases in time. And the number of cases when there are 15 or 16 'X' characters is very small. There are actually 34 such cases: X[15], X[16], X[i]OX[15-i], X[i]LX[15-i], 1 ≤ i ≤ 15, where X[cnt] denotes cnt 'X' characters in a row. For each of these cases you can compute the answer on your computer by using the same method and waiting, say, 5, 10 or even 20 seconds. Than you can just insert these 34 answers into your code. For example, see the following submission by ACRush.

Of course, it would much more interesting to solve the problem without precomputation and in a slower language like Java. To do this, let's generalize the game a bit. Suppose that game state consists of not one, but possibly several strings. In one move, a player should choose one of the strings and make a move there. As only some player makes "LOL" within some string, he becomes a winner.

Let's give several simple facts that allow to reduce the number of states significantly:

In a generalized game, the order of game strings doesn't matter.
We can reverse any game string and the answer will not change. This is because "LOL" is a palindrome.
If A and B are any game strings, then the answer for the string A + "L" + B is the same as the answer for the generalized game with 2 strings: A + "L" and "L" + B. The reason is that "LOL" can only formed within the string A + "L" or within the string "L" + B.
If A and B are any game strings, then the answer for the string A + "OO" + B is the same as the answer for the generalized game with 2 strings: A and B. If the number of "O"s separating A and B is more than 2, this fact still holds. Note, however, that it's not true for just one "O".
The answer for the string "O" + A is the same as for the string A. The answer for the string B + "O" is the same as for the string B.
The answer for the string "LXO" + A is the same as for the string "XO" + A. The answer for the string B + "OXL" is the same as for the string B + "OX".
Suppose that for some string there's no way for "LOL" to appear after any moves. An example of such string is "XLLX". Since the actual moves within this string are irrelevant, we no longer need to know the exact string in order to proceed the game, it's enough just to know the number of "X"s in this string. In case of generalized game, it's enough to know the total number of "X"s in all such strings. Let's call these 'X' characters free.

Using these facts, we can define the state as an integer, that gives the number of free 'X' characters, and an array of strings, that gives all game strings. In addition, the state will satisfy to the following properties:

The array of strings is sorted in non-descending order. Each string is lexicographically less (or the same) as its reversal.
For any string in the array: it doesn't start from "O" or "LXO"; it doesn't end with "O" or "OXL"; it doesn't contain occurrences of "OO"; it doesn't contain letters 'L' except, possibly, its first and its last characters.
For any string in the array, it's possible for "LOL" to appear within this string after some moves. In particular, it means that the length of the string is at least 3.

These optimizations reduce the number of states significantly, but still there are pretty many states and the work done per one state is increased. Fortunately, most of the states are such that it's possible to win just in 1 move (let's call these states fast winning). Since generating the list of moves from a given state is a very expensive operation, it makes sense to modify things a bit in such way that solve is never called for a fast winning state. Note that fast winning states are very easy to detect: these are exactly the states where at least one game string contains an occurrence of "XOL", "LXL" or "LOX". Our modification works as follows: we just do not include fast winning states onto the list of moves M. As moving into a fast winning state leads to immediate defeat, we should make such move only if there are no other choices, therefore this modification is correct. The only other thing that needs to be changed is a treatment of case when M is empty. If there are no "X" characters in game strings, it's still a draw, otherwise it means that any move leads to a fast winning state, so it's actually a defeat in 2 moves.

After fast winning states are eliminated, the number of considered states becomes quite small. For the worst case of X[16] the number of states is just 19,657. As I've already mentioned, we need to do pretty much of work per state, so the runtime of this approach is still quite slow, but it fits within the limit of 2 seconds. For a possible implementation, you can check my commented Java solution in practice room for this SRM. It passes the worst case of X[16] in 1.4 seconds.