JOIN
Get Time
long_comps_topcoder  Problem Statement
Contest: 2011 TCO Marathon Round 3
Problem: StringCompression

Problem Statement

    In this task we will consider a certain method of data compression. The compressed representation of data is a vector of N strings (S[1], S[2], ..., S[N]), where N <= 8. Each character in each string S[i] is a digit or a lowercase letter ('a'-'z'). Only the digits between '2' and N, inclusive, are allowed, so, for example, for N = 5 the set of allowed digits is {'2', '3', '4', '5'}.



In order to decompress such vector of strings into a single string, it's necessary to call decompress(1) and take the return value. The pseudocode for decompress function is shown below:
string decompress(int id) {
	string res = "";
	for (int i = 0; i < length(S[id]); i++) {
		if (S[id][i] is a letter) {
			res += S[id][i]
		} else {
			res += decompress(parseInt(S[id][i]));
		}
	}
}
In other words, the function goes through the characters of S[id], from left to right. Each letter is directly appended to decompression result. Each digit corresponds to a certain string from S (for example, '3' corresponds to S[3]). We call decompress for the corresponding string and append the return value to decompression result.



In some cases, the decompression routine falls into infinite recursion, making it impossible to decompress the data. Therefore, we impose an additional restriction on the compressed data. Consider a directed graph containing N vertices 1, 2, ..., N. There is an arc from vertex i to vertex j in this graph if and only if S[i] contains an occurrence of digit j. In this task, only such compressed data is allowed that corresponding directed graph doesn't contain a cycle reachable from vertex 1. Note that loop arc is also considered to be a cycle. With this additional restriction, the recursion is always finite and decompression result is well-defined.



Implementation details



You will need to implement the method compress that takes String data and int[] limits as its input parameters. You should compress data into a vector (S[1], S[2], ..., S[N]) as described above. The value of N must be equal to the number of elements in limits. The length of each S[i] should not exceed the value given in i-th (1-based) elements of limits. The return value must be a String[] containing Strings S[1], S[2], ..., S[N], in this exact order.



The decompression result of your return value must not be exactly equal to data, but it must be as similar to data as possible. The similarity between two strings A and B is defined as the number of positions i such that 1 <= i <= min{length(A), length(B)} and i-th (1-based) characters in A and B are the same.



Scoring



Suppose that your return for a test case has a valid format and decompresses to a string ret. Then your score for this test case will be equal to (similarity between ret and data) / length(data).



Your overall score is calculated as follows: for each test case, you get the amount of points equal to YOUR_SCORE / MAX_SCORE, where YOUR_SCORE is your score for this test case and MAX_SCORE is the maximum among scores for this test case achieved by all competitors. There is one exception: if MAX_SCORE = 0, then all competitors get 0 points for this test case.



Test case generation



In the text below, words "chosen" and "random" always imply that uniform distrubution is used, unless another distribution is specified.



Each test case will be generated as follows:
  • N is chosen between 4 and 8, inclusive.
  • limits is generated as an int[] containing N elements, with each element being chosen between 16 and 32, inclusive.
data is generated in several steps:
  • order is chosen as a permutation of {1, 2, ..., N} such that its last element is equal to 1.
  • Strings S[1], S[2], ..., S[N] are generated in the following order: first S[order[1]], then S[order[2]], ..., finally S[order[N]].
  • Each S[order[i]] is generated as follows. Its length is set to limits[order[i]]. Each character is initially set to '?'. Then, for each j, 1 <= j < i, in order from i-1 down to 1, r distinct '?' characters are chosen and replaced with order[j] characters, where r is generated an min{m, random integer between 1 and 4} and m is the number of remaining '?' characters. Finally, if there are any '?' characters left, each of them is replaced with a random lowercase letter.
  • Strings S[1], S[2], ..., S[N] are decompressed into data.
  • If the length of data is greater than 100000, then data is replaced with the prefix of data containing first 100000 characters.
  • Real number errors is generated as 0.0, random value between 0.0 and 0.04, random value between 0.04 and 0.16, random value between 0.16 and 0.64 or random value between 0.64 and 2.56 (each of these 5 cases is equiprobable).
  • X times the following procedure is repeated: a random position within data is chosen and the character at this position is replaced with a random lowercase letter. X is calculated as floor(errors * length(data)).
Tools



An offline tester is provided. You can check its source code for the exact scoring and test case generation implementations.
 

Definition

    
Class:StringCompression
Method:compress
Parameters:String, int[]
Returns:String[]
Method signature:String[] compress(String data, int[] limits)
(be sure your method is public)
    
 

Notes

-The memory limit is 1024 MB and the time limit is 30 seconds (which includes only time spent in your code).
-There is no explicit code size limit. The implicit source code size limit is around 1 MB (it is not advisable to submit codes of size close to that or larger). Once your code is compiled, the binary size should not exceed 1 MB.
-The compilation time limit is 30 seconds. You can find information about compilers that we use and compilation options here.
-The processing server specifications can be found here.
-There are 10 example test cases and 100 full submission test cases.
 

Examples

0)
    
S[1] = bpa4h2edn2b3k2d3z2e43b
S[2] = 3i3ufx4uku34gzfob4vlozd3m
S[3] = juquvqvrudlpqezzzchprdwi
S[4] = kozid3pvevaytgjxdmjpfbsfhxrtovqp
errors = 0.003570648105904275
length(data) = 1311
1)
    
S[1] = 33xe4nav256774632246n65237m7
S[2] = 7mz6b46i5wpd6sovsnjt5xn6hht55ca
S[3] = 5v4k4imw2667uo44x62t5f65n
S[4] = f6c6tn6plkzax7gtgud
S[5] = ebr4x7c4t46tuedf7j6ha7s4nzptd6
S[6] = gutnshwwqyfrcrlprmaurwieeus
S[7] = klgkjnfg6iwhfcdwaomtlux
errors = 0.7667244724729031
length(data) = 58950
2)
    
S[1] = 544ocr256cvo74ywk34se66hjt5
S[2] = 665t75iw65ei7457o
S[3] = 67l25p5z4l64x7ycxe7z6
S[4] = ipfjqgg6sbm6zx5tmg5jfyfcmdy
S[5] = pptecff6woegvwpukyuavhnktevqnfq
S[6] = bnecgicbuhyqfqgarrmxkozjk
S[7] = e45h5a6kj6qsjbiragtzodcutjtdsuc
errors = 2.0263739888680203
length(data) = 6205
3)
    
S[1] = 23w2kik43a2dez3mrj
S[2] = empd4lqbkba4ev33l4ljiv4rxfes
S[3] = cforseutrzvouuoppwybxjxxbjmdejn
S[4] = g3dpdqmecboxwo3zcdwa
errors = 0.0
length(data) = 1396
4)
    
S[1] = bky3ckna4nt3o552tkba5unj35q
S[2] = nnm3yq4v5c4sjzyiiecdes5xf53y4c
S[3] = nr55pbqo5em4v5jruwn4
S[4] = xumgyhugkpwenmweqg
S[5] = gwod44gi4bu4cvqsgpn
errors = 1.0960011323141683
length(data) = 2711
5)
    
S[1] = c4frd4br33jgqhp4cb2kzt5qxc2k5et3
S[2] = p5yokpgx5ytbnacgbi5lra5
S[3] = d2is42c5ftpa2gr5zqsuezcub2fk4p
S[4] = vvn2nt52vllhrxoznoa
S[5] = cgeyyfebjljifbhalenuogap
errors = 0.01715661382260626
length(data) = 4320
6)
    
S[1] = c6h7u358366573678438p2l8
S[2] = ynrquifpgcqavcm3equxwneks3lslwc
S[3] = aazrvpndtuvlfgufoxqjuqqyugjn
S[4] = 3xkes3att5jb7ne557jripd72z3d
S[5] = jlzftt22pc2xv2zc3fcy3nv
S[6] = 3n3vt4iawgvtirnfk77q2kd32oq47725
S[7] = vb23fegohgu3olb2ddl5i22
S[8] = 7627fez2l5737262535s4
errors = 0.5985033504889105
length(data) = 100000
7)
    
S[1] = c233jtmskmckqbl3yzna3q4kypapake
S[2] = bouunakenlnodraual
S[3] = vyr2royquonyls2cw
S[4] = ub3wn32uzudhfbqduzurfcam2
errors = 0.15223077408241453
length(data) = 406
8)
    
S[1] = cn4jm255623bq4lz65q2urzgm62l6cc
S[2] = f5wh66vfpa6oxsnwp4voyqsrpd
S[3] = o664q52d655y2264
S[4] = xubwh6svaqirgunupnyboscpbbqdrdu
S[5] = vc4jz6eufd4sotbwomus
S[6] = mkowbzcazqengucvuvezdgycpnfwc
errors = 1.156040229448038
length(data) = 3790
9)
    
S[1] = bp2ahqjm4k22skgghrvkmnv32lajs
S[2] = oqbjwemtozqgzsuzlbosazxwgitqe
S[3] = vjjf4wyjuhwg2usmbm44nokf
S[4] = ze2hz22pmolu2ulhxz
errors = 0.011120480332477962
length(data) = 708

This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2010, TopCoder, Inc. All rights reserved.