The purpose of this challenge is to develop an algorithm that learns from crowdsourced human annotations of anomalies (i.e. structures) found in satellite imagery. The task has to be achieved on-line by learning from examples, having minimal previous knowledge about the structures to be recognized.
Background and motivation
Help us solve an 800-year-old mystery! Explorer Albert Lin and his team are conducting a non-invasive survey in the region of the lost tomb of ruler Genghis Khan. Unfortunately, nothing has ever been documented about this archaeological anomaly, and there are few clues about its potential appearance. You can help by using information gathered by the on-line crowd of explorers who used their own perceptive intuition to tag potential relevant anomalies found in satellite images of the area. We want you to build an active machine learning algorithm that can learn from the human generated annotations to automate a search that is as powerful and sensitive as the pooled human perception demonstrated by the crowd when defining the unknown.
The prize purse for this challenge is 15,000 USD and a trip to the 2013 TopCoder Open in Washington, DC! The contestants with 4 highest final scores will be awarded the following prizes:
1st place 7,000 USD and a trip to the 2013 TopCoder Open
2nd place 4,000 USD
3rd place 2,000 USD
4th place 1,000 USD
TopCoder may *offer* to purchase submissions that did not win any prize if the client is interested in using them.
Additionally, there are two 700 USD and 300 USD prizes for unusual and perspective ideas. Only contestants with 5th and lower final scores are eligible for these prizes. The ideas must be described and submitted in a text document. You do not need to implement your idea (though if it is incorporated into your marathon match final submission, it can be a plus). The submission phase for ideas is separate from the marathon match and will open once the match is over. Additional details will be communicated at the match forum. Evaluation of ideas is subjective, but the main evaluation criteria for the winning ideas is: can (and how much) the Top-4 marathon match algorithms benefit from this idea? TopCoder reserves the right not to award this prize if no ideas are considered to be useful enough.
Unlike most image processing contests this challenge does not come with plenty of training images to learn from off line. Some example images can be downloaded from here, but most of training must happen during the execution of your solution by learning from the examples your algorithm sees. More about this process below.
The image annotation process
The annotation data has been created by users in the crowd at the National Geographic exploration site. Feel free to register on the site in order to explore and discover ancient structures yourself. The National Geographic exploration site allows users to annotate satellite images with annotation types such as: rivers, roads, modern structures, ancient structures and other interesting objects. In this contest the focus will be in automatically recognizing modern and ancient human built structures. All satellite images provided to your algorithm were captured with a resolution of 1 meter per pixel. All the images will be 1236 pixels wide and 630 pixels high. All the images will have at least 50 annotations available for processing.
The learning and testing process
Your algorithm will be executed in a process that simulates an active learning setup. First your algorithm must perform training by examining annotated images, then its accuracy is tested by verifying how well it managed to learn the features of the structures to be recognized. The algorithm is expected to perform training on-line. The type of object that needs to be recognized can change significantly from test case to test case.
- For each test case the algorithm has access to 839 images. Your algorithm can request to examine any of the images by giving the unique ID of the image. The server provides the raw image data plus some meta data about the annotators who processed the image.
- The algorithm can ask for annotation data for any of the images, the server provides it.
- During training the algorithm can examine as many images as it wishes and can ask for annotations of a maximum number of N users in total. The maximum number of user annotation requests N will depend on the test case and will be in the range between 500 and 8000. With one request you will receive up to 5 annotations that a user made on the specific image.
- The provisional tests will have N set to 800, 2000 and 5000 respectively.
- The system tests will have N set to 500, 1000, 1000, 2000, 4000, 4000 and 8000 respectively.
- You can request each image and each annotation only once.
Implementation, Testing and Scoring
- The server provides 100 test images one by one to your algorithm which should then return an answer by placing its own annotations.
- Accuracy of human built structure recognition against the crowd annotations is calculated. The example test case will ask you to recognize modern structures. 2 provisional test cases will ask for modern structure recognition and 1 provisional test case will ask for ancient structure recognition. The system tests contain 5 test cases that ask for ancient structure recognition and 2 test cases for modern structures.
- Overall score is calculated based on the accuracy of how well the annotations match those placed by the crowd.
Your class will need to implement the following methods.
String init(String imageMetaData, int N)
This method will be called only once, in the beginning of the training phase. The method receives the maximum number of annotations that can be requested N. It also receives data about the available training images in an array of Strings, one element per image. Each element of the array is a comma separated list of records following these rules:
- The first element is the unique identifier of the image.
- The second, third and fourth elements give information about a human annotator who processed this image. These elements are integer numbers, they represent the total number of annotations this user created before annotating this image, during the whole crowd sourcing project and the number of seconds the user spent between his first and last annotation when processing this image.
- If there are annotations available for this image from more than one annotator then information about those additional annotators are given in subsequent triplets of elements as defined in the above line. So if we have K annotators then there will be altogether 3*K+1 records.
An example record in the meta data featuring 2 annotators (The first annotator made 90 annotations before inspecting image28.jpg and 150 annotations in his lifetime. It took the annotator 30 seconds to annotate the image):
The method must return a String in one of the formats specified below.
- If you wish to end the training phase you must return "END" (quotes for clarity only).
- If you wish to examine an image (i.e. receive raw pixel data) you must return the image's unique identifier.
- If you wish to receive annotations for an image then you must return the image's unique identifier followed by a comma and the (0-based) ordinal number of the annotator whose annotations you would like to receive. Keeping with the above example you must return "image28.jpg,1" (quotes for clarity only) to receive the annotations created by the 2nd annotator.
String receiveImage(String imageId, int image)
This method will be called during the training phase each time you request raw image data. It takes two parameters as input. The first parameter is the unique identifier of the image you requested. The second describes an image as an array of integers as follows:
Height H (in pixels)
Width W (in pixels)
Data of pixel at row 0, column 0
Data of pixel at row 0, column 1
Data of pixel at row 0, column W-1
Data of pixel at row 1, column 0
Data of pixel at row 1, column W-1
Data of pixel at row H-1, column 0
Data of pixel at row H-1, column W-1
Rows are numbered from top to bottom. Columns are numbered from left to right. The data of each pixel is a single number calculated as 2^16 * Red + 2^8 * Green + Blue, where Red, Green and Blue are 8-bit RGB components of this pixel.
The method must return a String in the format and with the meaning exactly as described for the init() method.
String receiveAnnotations(String imageId, int annotator, String annotations)
This method will be called during the training phase each time you request annotations. It takes three parameters as input. The first parameter is the unique identifier of the image you requested. The second is the ordinal number of the annotator whose annotations you requested. The third is an array of Strings where each element describes an annotation as a comma separated list of records following these rules:
- The first record defines the type of annotation, and is one of these two Strings: "other" and "structure" (quotes for clarity only). The "other" string indicates that the location was annotated but the type of object at that location should not be recognized in this test case. The "structure" string indicates that the location was annotated by the annotator as being a structure that the algorithm must learn to recognize in this test case. For example, in test cases where you need to recognize modern structures all annotations except modern structures will be shown as "other" and modern structures will be shown as "structure".
- The second and third records define the X and Y coordinates of the annotation, respectively, in pixels. X is from left to right, Y is from top to bottom.
The annotations array may be empty and may contain a maximum of 5 elements.
The method must return a String in the format and with the meaning exactly as described for the init() method.
String labelImage(int image)
This method will be called during the testing phase many times, once for each test image. It takes raw pixel data as input, exactly as defined for the receiveImage() method. The method must return an array of Strings representing annotations your algorithm places for human built structures. The format of the annotations is a comma delimited string "X,Y,P". The X and Y define the pixel location of the structure and P is the confidence level of your algorithm on this prediction. The confidence value must be between 0 and 1, inclsuve. For example, "100,50,0.8" indicates that a structure is located at X position 100 and Y position 50 with a confidence value of 0.8. The returned array may be empty. The distance between any two locations that your algorithm outputs must be more than 80 pixels. In other words, if we draw circles with a radius of 40 pixels centered at each location, these circles must not overlap. Formally, for all unique pairs i and j in your return, (Xi-Xj)*(Xi-Xj)+(Yi-Yj)*(Yi-Yj)>(40+40)*(40+40) must be true.
The scoring for a single test image works as follows:
For each of your predicted locations, the number of structures F marked by users from the crowd within a radius of 40 pixels of your predicted location is counted. If a user placed more than one annotation within the 40 pixel radius, only one of them is counted. Let U be the number of users that annotated the image, then the confidence of the crowd that the location contains a structure is calculated by C=F/U. The score for the prediction will be calculated as s = 2*C*P - P*P. The raw score for an image I is then the sum of s over all your predictions. Note that C and P will be in the range of 0 to 1, inclusive.
Additionally, for each test case we will calculate your normalized score. It is equal to 1,000,000 * YOUR / BEST, where BEST is the highest positive raw score currently obtained for this test case (considering the last submission from each competitor) and YOUR is your raw score. If your program failed to produce a valid solution, then the raw score is -1 and the normalized score is 0.
Your total score is equal to the arithmetic average of normalized scores on all test cases.
The total time that your algorithm may execute during a single test case is 80 minutes.
A visualizer tool is provided in order for you to test your algorithm locally. It can be downloaded from here.
- The solution can be implemented in C++, Java, C# or VB. Python submissions are not accepted.
- You can include open source code in your submission provided that the open source code must be licensed under an acceptable license, separated from and clearly identified in your code, and your submission in this competition must be in compliance with the license. An open source license is acceptable if:
The question of whether a license is acceptable, and particularly whether a license that is not OSI-approved meets the requirements of the Open Source definition, is complex and will be decided by TopCoder in TopCoder’s sole discretion, and so we strongly recommend that, if you use open source code, you use code that is clearly licensed under an OSI-approved license. Questions about specific open source licenses may be asked in the forums, but given the complexities of license evaluation, is it likely that TopCoder will not be able to respond during the contest.
- it is an OSI-approved open source license (listed at http://opensource.org/licenses), or
- it is a license that meets the requirements of the OSI Open Source definition (http://opensource.org/osd), or
- it is an explicit and clear dedication by the author(s) to the public domain.
- You may use any external source of data to train your solution as long as it is publicly available and its licensing terms allow such usage.
- In order to receive the prize money, you will need to fully document the derivation of all parameters internal to your algorithm. If these parameters were obtained from the training data set, you will also need to provide the program used to generate these training parameters. There is no restriction on the programming language used to generate these training parameters. If you used external data sources then the sources must be clearly identified so that their availability and licensing terms could be verified. Note that all these data should NOT be submitted anywhere during the coding phase. Instead, if you win a prize, a TopCoder representative will contact you directly in order to collect these data. The description must be submitted within 7 days after the contest results are published.