JOIN
Get Time
long_comps_topcoder  Problem Statement
Contest: USPTO Algorithm Followup Challenge
Problem: PatentLabeling2

Problem Statement

    This problem is a follow-up on PatentLabeling problem. If you haven't seen that problem yet, please read its problem statement. This problem is absolutely the same except several changes outlined below:
  1. Only part labels need to be detected.
  2. The value of threshold parameter used for scoring is 0.5.
  3. The definition of similar titles is slightly modified. In the first step of checking whether two titles are similar, all characters in both titles except letters ('a'-'z', 'A'-'Z'), digits ('0'-'9'), '(' (ASCII 40), ')' (ASCII 41), '+' (ASCII 43), '-' (ASCII 45), '=' (ASCII 61), ''' (ASCII 39), '.' (ASCII 46), '/' (ASCII 47). The other two steps are the same as in original problem.
  4. The agreements in "Agreements on figure titles and part label texts" section of the original problem are not relevant in this problem. It is still true that most of part label texts contain only digits and letters. However, the usage of other characters '(', ')', '+', '-', '=', ''', '.', '/' is not guaranteed to be consistent with that of the original problem. Subscript is not designated in any special way in this problem. Some part label texts are long enough and consist of multiple words separated by space characters in the reference labeling. However, these space characters are not significant, since the first step of checking two titles for similarity will remove all of them.


Test data



139 new images were prepared for this problem. Of them, 30 images are provided to you for training, 30 images will be used for provisional tests and remaining 79 images for system tests. The split of images between training/submission/system test sets was completely random. 10 examples are randomly chosen out of 30 training images.



All images from the original PatentLabeling problem can also be used for training purposes. They can be downloaded here, here and here.



Reusing code and ideas



Your solution can reuse source code and ideas from any solution of the original challenge that won a money prize. More exactly, a solution must have a positive score and be a 1st or a 2nd place winner in corresponding room in order for you to be able to reuse it. The list of source codes submitted for the challenge can be found here. For each coder, it is possible to reuse only the last submission he/she made. If both team members made submissions, then only the submission that scored higher (according to system tests) can be reused. If both submissions received the same score, it is possible to reuse any of them.



Additionally, we provide solution descriptions written by global Top-5 finishers (links to their source codes: 1, 2, 3, 4, 5) of the original challenge. You are allowed to reuse source code and ideas from these descriptions as well.



As in the original challenge, your solution can include Apache 2 compatible open source code. More details can be found here.



Special conditions



In order to receive the prize money, you will need to fully document the derivation of all parameters internal to your algorithm. If these parameters were obtained from the training data set, you will also need to provide the program used to generate these training parameters. There is no restriction on the programming language used to generate these training parameters. Note that all this data should not be submitted anywhere during the coding phase. Instead, if you win a prize, a TopCoder representative will contact you directly in order to collect this data.
 

Definition

    
Class:PatentLabeling2
Method:getPartLabels
Parameters:int, int, int[], String[]
Returns:String[]
Method signature:String[] getPartLabels(int H, int W, int[] image, String[] text)
(be sure your method is public)
    
 

Notes

-The match forum is located here. Please check it regularly because some important clarifications and/or updates may be posted there. You can click "Watch forum" if you would like to receive automatic notifications about all posted messages to your email.
-The time limit is 1 minutes per test case and the memory limit is 1024 MB. We reserve the right to reduce the time limit if the computation power of our systems will prove to be not high enough and a very big testing queue will arise.
-There is no explicit code size limit. The implicit source code size limit is around 1 MB (it is not advisable to submit codes of size close to that or larger). Once your code is compiled, the binary size should not exceed 1 MB.
-The compilation time limit is 30 seconds. You can find information about compilers that we use and compilation options here.
 

Examples

0)
    
US6692404-1
1)
    
US7895094-1
2)
    
US4697734-2
3)
    
US6695741-1
4)
    
US6547688-2
5)
    
US7886128-3
6)
    
US7041030-1
7)
    
US7232393-1
8)
    
US3751749-4
9)
    
US7892135-1

This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2010, TopCoder, Inc. All rights reserved.