JOIN
Get Time
long_comps_topcoder  Problem Statement
Contest: Pathology Segmentation *TCO17*
Problem: PathImageSegmentation

Problem Statement

    

Konica-Minolta Pathological Image Segmentation Challenge

Prize Distribution

1st place - $10,000

2nd place - $7,000

3rd place - $5,000

4th place - $3,000

5th place - $1,000

Introduction

This contest aims for segmentation for pathological images, which will help the diagnosis of cancers.

Requirements to Win a Prize

In order to receive a prize, you must do all the following:

  • Achieve a score in the top 5, according to system test results calculated using the Contestant Test Data. The score must be higher than the baseline result (i.e., 750,000). See the "Data Description and Scoring" section below.
  • Create an algorithm that both reads in a single 500 * 500 image and outputs the predicted mask in at most 10 minutes, running on an Amazon Web Services m4.xlarge virtual machine. If your solution relies on GPU, please propose the request when we contact you and should run on Amazon Web Services p2.xlarge (Tesla K80 GPU) virtual machine.
  • Within 7 days from the announcement of the contest winners, submit:
    • A complete report at least 2 pages long, outlining your final algorithm, and explaining the logic behind and steps to its approach. More details appear in the "Report" section below.
    • All codes and scripts used in your training and final algorithm in 1 appropriately named file (or tar or zip archive). The file should include (1) the scripts and codes for training and model creation; and (2) your final model, saved so that can make predictions without being retrained. Prospective winners will be contacted to setup your training and final model in the appropriate AWS environment
If you place in the top 5 but fail to do any of the above, then you will not receive a prize, and it will be awarded to the contestant with the next best performance who did all the above.

Background

In the medical field, compared to Radiology, digitization of Pathology was rather delayed. By the spread of WSI (Whole Slide Imaging) capable of digitally shooting the entire specimen, the situation has changed and digitization is rapidly proceeding. With the increase of large amounts of digital data, the burden of interpretation by the pathologist has increased intensively, and it is coming to the limits of human diagnosis.

Techniques for processing them by machine learning such as Deep Learning and applying them to individual cell recognition and cancer diagnosis have also been developed along with various image recognition contests.

With these backgrounds, Konica Minolta intends to develop a recognition technology that distinguishes between the cancer region and other regions that are positioned as the basis for all digital pathological image analysis.

The challenge is not easy, as currently an expert pathologist needs to comprehensively judge while looking at the individual and the whole area of the images. We believe with the latest Semantic segmentation technology, we should be able to get a model that can demonstrate the same performance as human beings.

We will be looking forward for many people to participate to this challenge!

Objective

All images are cropped in the shape of 500 * 500. 168 annotated images are provided as the training data for your development and validation. You are asked to build a model which can take an 500 * 500 image as input and output its corresponding mask.

Data Description and Scoring

There are 168 annotated images for training and 162 images for testing. The data can be downloaded through the Google drive link. For each image named $NAME (e.g., i105404), we will provide the following data:

  • $NAME.tif (e.g.,i105404.tif). This is the input image and is always available.
  • $NAME_mask.png (e.g.,i105404_mask.png). This is the groundtruth and is only available for training data. For testing data, it’s only used for visualization and thus is not required to submit.
  • $NAME_mask.txt (e.g., i105404_mask.txt). This is only available for training data. For testing data, it’s used to evaluate your scores and thus is required to submit. It contains a 500*500 binary matrix. We use the following Python script to turn $NAME_mask.png into this plain text form.
            from PIL import Image
            def convert_to_binary(truth_png_file, truth_txt_file):
                im = Image.open(truth_png_file)
                pix = im.load()
                with open(truth_txt_file, 'w') as out:
                for x in xrange(500):
                    for y in xrange(500):
                        out.write(str(int(pix[x, y] > 0)))
                    out.write('\n')
            

Your example submissions will be evaluated against the training data. Your full submissions will be evaluated against the testing data. The provisional test scores are based on a fixed subset of 81 images in testing data, while the system test scores are based on the other 81 images in the testing data.

We are using a combination of two metrics: micro-F1 and Dice Index (DI). The final score is computed as the following formula.

    Final Score = 1000000.0 * (micro-F1 + DI) / 2.0

micro-F1

Considering all images together, the micro-F1 score will be defined based on pixels. For each pixel in any image, there are 4 cases:

  • True Positive: Your prediction is white (1) and the truth is white (1).
  • False Positive: Your prediction is white (1) and the truth is black (0).
  • False Negative: Your prediction is black (0) and the truth is white (1).
  • True Negative: Your prediction is black (0) and the truth is black (0).

Let the total number of True Positive be TP, the total number of False Positive as FP, the total number of False Negative as FN, and the total number of True Negative as TN.

The precision (P) is defined as TP / (TP + FP) and the recall (R) is defined as TP / (TP + FN). The micro-F1 score is computed as 2 * P * R / (P + R). Specially, when TP + FP equals 0, we define P as 1; when TP + FN equals 0, we define R as 1; when P + R equals 0, we define F1 as 0.

Dice Index

For each image, given a set of pixels G annotated as white (1) in the ground truth and a set of pixels S predicted as white (1) in your submission, if the intersection between G and S is X, Dice Index is defined as 2 * |X| / (|G| + |S|). Specially, when |G| + |S| equals 0, we define the Dice Index as 1.

For a set of images, we will average the Dice Index of each image and use this as DI.

Implementation

During the contest, only your results will be submitted. You will submit code which implements only one function, getURL(). Your function will return a String corresponding to the URL of your answer (.zip).

This .zip file should include results (i.e., $NAME_mask.txt) of all training and testing data (330 in total). Each result is a .txt file of a 500 * 500 binary matrix as we described before. If some result files are missing, your submission will receive a score of -1.

You may use different names for the .zip file but should keep the same structure as follows.

    submission.zip
    |-- i105404_mask.txt
    |-- i117557_mask.txt
    |-- ...

You may upload your .zip file to a cloud hosting service such as Dropbox which can provide a direct link to your .zip file. To create a direct sharing link in Dropbox, right click on the uploaded file and select share. You should be able to copy a link to this specific file which ends with the tag "?dl=0". This URL will point directly to your file if you change this tag to "?dl=1". You can then use this link in your getURL() function. Another common example is to use Google drive for sharing the link. If you choose that, please use the following format to create a direct sharing link: "https://drive.google.com/uc?export=download&id=" + id; You can use any other way to share your result file but make sure the link you provide should open the filestream directly.

Your complete code that generates these results will be tested at the end of the contest.

General Notes

  • This match is rated
  • In this match you may use any open source software. If your solution includes commercial licensed software, even just in the training stage, please ask in the forum if it can be approved. You must include the full license agreements for all software used (including any open source and approved commercial license) with your submission. Include your licenses in a folder labeled "Licenses". Within the same folder, include a text file labeled "README" that explains the usage of 3rd party software and purpose of each licensed software package as it is used in your solution.
  • In this match you may use any programming language and libraries, including commercial solutions, provided Topcoder is able to run it free of any charge. You may also use open source languages and libraries, with the restrictions listed in the next section below. If your solution requires licenses, you must have these licenses and be able to legally install them in a testing VM (see "Requirements to Win a Prize" section). Submissions will be deleted/destroyed after they are confirmed. Topcoder will not purchase licenses to run your code. Prior to submission, please make absolutely sure your submission can be run by Topcoder free of cost, and with all necessary licenses pre-installed in your solution. Topcoder is not required to contact submitters for additional instructions if the code does not run. If we are unable to run your solution due to license problems, including any requirement to download a license, your submission might be rejected. Be sure to contact us right away if you have concerns about this requirement.
  • You may use open source languages and libraries provided they are equally free for your use, use by another competitor, or use by the client.
  • The usage of external resources (pre-built segmentation models, additional pathological imagery, etc) is allowed as long as they are freely available.
  • Use the match forum to ask general questions or report problems, but please do not post comments and questions that reveal information about the problem itself or possible solution techniques.

References and Citations

Following are good reference and starting point to learn more about the challenge and solutions.

[1] http://www.andrewjanowczyk.com/use-case-2-epithelium-segmentation/

[2] http://www2.warwick.ac.uk/fac/sci/dcs/research/tia/glascontest/

[3] Janowczyk, Andrew, and Anant Madabhushi. "Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases." Journal of pathology informatics 7 (2016).

The pathology images and the portion of annotation data used in this challenge is from [1,3]. We appreciate greatly to Dr. Janowczyk for allowing us to use and manipulate the dataset in this challenge.

Terms and NDA

This challenge will follow the below standard Topcoder Terms and NDA

[1] Standard Terms for TopCoder Competitions v2.1 - https://www.topcoder.com/challenge-details/terms/detail/21193/

[2] Appirio NDA 2.0 - https://www.topcoder.com/challenge-details/terms/detail/21153/

 

Definition

    
Class:PathImageSegmentation
Method:getURL
Parameters:
Returns:String
Method signature:String getURL()
(be sure your method is public)
    
 

Examples

0)
    
Seed: 0

This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2010, TopCoder, Inc. All rights reserved.