Get Time
long_comps_topcoder  Problem Statement
Contest: DATCompression
Problem: DATCompression

Problem Statement

    DNA sequencing has revolutionized biological research and is set to transform medicine as well as this technology scales. Machines like Ion Torrent's Proton sequencer will deliver a $1000 genome in a day. In order to deliver these data at scale the data need to be as compact and fast as possible. Please be a part of history delivering the first $1000 genome and help bring this technology to the clinic to save lives by making the data storage as efficient as possible in both time and space.

Each well on the Ion Torrent chip is a pH sensor producing a number of counts proportional to the change in hydrogen ions in the wells. Each well occupies a unique x, y position on the chip and produces a vector of 14-bit integers measuring the concentration of hydrogen ions in the well over different time points for a single flow.

Input data

The test data in this problem consists of 39 files. 19 of them (randomly selected) are available for download. The other 20 files are randomly split between submission (6 files) and system (14 file) test cases.

Each line of the files consists several tab-separated integers. The first two integers are x and y -- the coordinates of a position on the chip. Each next integer is the result of a single measurement. The measurements are listed in the same order as they are done. Each line contains the same number of measurements (we will denote it as L). The measurements represent the concentration of hydrogen ions in the well (x, y) and should be changing relatively smoothly over time (t = 0, 1, ..., L-1). The coordinates are always listed in increasing order of x, breaking ties by increasing order of y. In order words, if X and Y are the amounts of different x-coordinates and y-coordinates, respectively, then the order of coordinates in the file is (0, 0), (0, 1), ..., (0, Y-1), (1, 0), (1, 1), ..., (1, Y-1), ..., (X-1, 0), (X-1, 1), (X-1, Y-1). In each file (including the ones used for submission and system tests), you can assume that the following holds: 100 <= X <= 110, 100 <= Y <= 110, 50 <= L <= 75, each measurement value is in 0..16383 (inclusive).

Implementation details

You will need to implement 3 methods: init, compress and decompress. The grader will call init in the beginning of testing process. You are free to do any initialization there. The runtime will be limited by 10 seconds and the time spent in this method does not matter for your score.

compress will take the data to be compressed as int[] data. It will contain the following data (in this exact order), each integer in a separate element: X, Y, L, L measurements for the 1st line, L measurements for the 2nd line, ..., L measurements for the (X*Y)-th line. The first two integers (x and y coordinates) in each line are omitted because of being redundant. The returned value of compress is a compressed representation of data, where each element must be in 0..255 (inclusive).

The returned value for compress gets passed as int[] data to your decompress implementation. It needs to decompress the data back and return a int[] with exactly the same contents as int[] that was provided to your compress method.

The data is of relatively small volume, so in order to ensure better time measurement, the grader program will call compress and then decompress 200 times (using the same data). The contents of int[] that your solution returns from compress must always be the same and decompress must always correctly decompress the data back. The total time spent in compress and decompress methods must not exceed 30 seconds.


If your solution violates time or memory limit, crashes or violates any of restrictions specified above, the score for a test case will be 0. Otherwise, the score is calculated as 1,000,000 * (CompressionRatio + 1.5 * (2.5 / max{2.5, Time})^0.75). Time is the total time (in seconds) spent in compress and decompress methods. CompressionRatio is 2 * A / B, where A is the number of elements in data provided to compress method and B is the number of element in the return value from compress method.

Your overall score on a set of test cases is just the sum of scores obtained on separate test cases.

Special conditions

Only C++ submissions are allowed in this contest.

In order to receive the prize money, you may need to provide a description of how your solution works. Note that this description should not be submitted anywhere during the coding phase. Instead, if you win a prize and we need this description from you, a TopCoder representative will contact you directly.

The TopCoder Marathon Framework is organized so that you are able to pass data between different calls of your methods. Passing any data between compress and decompress (except compressed version of input) contradicts to the nature of the problem and therefore is not allowed. In order to make easier to check that your solution does not break this rule, the following restrictions must hold for your solution:
  1. You are allowed to have global/static data. init can work with it without limitations. compress and decompress can only read this data, but can't modify it.
  2. All fields in DATCompression class can only be read-only within compress and decompress. The non-static fields in other classes, if you have them, are allowed to be modified.
  3. The return value from compress must have a well interpretable general meaning not tied to the current run of your solution. In particular, you are not allowed to pass pointers to data through the return value.
Note the restrictions above imply that each call of compress and decompress needs to allocate all memory necessary for computations and return value and no such memory can be shared between different calls.

We would like to emphasize that any way to share data between compress and decompress (other than compressed input) is not allowed, even if it is not specifically covered by restrictions above, and can be a reason for disqualification.


Method signature:int[] compress(int[] data)
Method signature:int[] decompress(int[] data)
Method signature:int init()
(be sure your methods are public)


-Only C++ submissions are allowed in this competition.
-Multithreading in your solution is not allowed.
-The memory limit is 2 gigabytes. The time limit is 10 seconds (for init method) and 30 seconds (for all time spent in compress and decompress methods).
-There is no explicit code size limit. The implicit source code size limit is around 1 MB (it is not advisable to submit codes of size close to that or larger). Once your code is compiled, the binary size should not exceed 1 MB.
-The compilation time limit is 30 seconds. You can find information about compilers that we use and compilation options here.
-The processing server specifications can be found here.



This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2010, TopCoder, Inc. All rights reserved.