HMS Challenge #1b
|| Problem Statement
| ||This problem is a follow-up on MinorityVariants problem. If you haven't seen that problem yet, please read its problem statement. This problem is absolutely the same except several changes outlined below.
The problem with the previous iteration of MinorityVariants is that it was possible to group reads together so that all reads in the same group belong (or almost certainly belong) to the same position. Thus, all reads from the same group are either all real or all fake, so it was possible to classify the whole group together instead of classifying separate reads. In a real world situation, the reads at the same position can be both real and fake, thus such approach is useless.
This problem reuses the same data as MinorityVariants, with one exception: each k-mer abundance was multiplied by a value in [0.9, 1.1] chosen uniformly, at random. Implementation specification and scoring method are absolutely the same as for MinorityVariants problem as well.
Reusing code and ideas
Your submission is allowed to reuse any code and ideas from the best 5 submissions of the original MinorityVariants challenge (only the final submission from each member). Please note, however, that all these solutions used the k-mer grouping trick and this limits their usefulness for this challenge.
You are also allowed to use the code and ideas from submission that placed 36 (again, only the final submission). This is the highest scoring submission that does not attempt to group reads together. You can also download and use the description of this solution and code used to train it.
You are not allowed to use neither code nor ideas from any other submission of the original MinorityVariants challenge.
In order to receive the prize money, you will need to describe how your algorithm works and to document the derivation of all parameters internal to your algorithm. If these parameters were obtained from the training data set, you will also need to provide the program used to generate these training parameters. There is no restriction on the programming language used to generate these training parameters. Note that all this data should not be submitted anywhere during the coding phase. Instead, if you win a prize, a TopCoder representative will contact you directly in order to collect this data.
The data modification makes it harder to group reads belonging to the same position together, but it does not guarantee that no such approaches are possible at all. However, any such approaches are forbidden for this problem. In other words, your solution must classify each read separately and should avoid trying to group and classify several reads together. This restriction will be enforced by the client after the match based on solution descriptions that the winners need to provide. While the client will try to be as objective as possible, you need to be aware that this restriction is at least partially subjective by its nature. During the course of the contest if you have any doubts about the eligibility of your approach based on this stated restriction, we recommend that you send an email to firstname.lastname@example.org asking specifically whether your intended approach will be eligible. The customer will rule as to the eligibility of the described approach in as timely a manner as possible. Please make sure not to post such questions to the forums, since this is against the rules.
|Method signature:||int classifyReads(String reads)|
|(be sure your method is public)|
|-||The match forum is located here. Please check it regularly because some important clarifications and/or updates may be posted there. You can click "Watch forum" if you would like to receive automatic notifications about all posted messages to your email.|
|-||The time limit is 10 minutes per test case (this includes only the time spent in your code). The memory limit is 2 gigabytes.|
|-||There is no explicit code size limit. The implicit source code size limit is around 1 MB (it is not advisable to submit codes of size close to that or larger). Once your code is compiled, the binary size should not exceed 1 MB.|
|-||The compilation time limit is 30 seconds. You can find information about compilers that we use and compilation options here.|
This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2010, TopCoder, Inc. All rights reserved.