JOIN
Get Time
long_comps_topcoder  Problem Statement
Contest: RoadDetector
Problem: RoadDetector

Problem Statement

    

Prize Distribution

              Prize             USD
  1st                          $25,000
  2nd                          $10,000
  3rd                           $5,000
  4th                           $3,000
  5th                           $2,000
  Top Graduate                  $2,500
  Top Undergraduate             $2,500
Total Prizes                   $50,000

Why this challenge matters

The commercialization of the geospatial industry has led to an explosive amount of data being collected to characterize our changing planet. One area for innovation is the application of computer vision and deep learning to extract information from satellite imagery at scale. DigitalGlobe, CosmiQ Works, and NVIDIA have partnered to release the SpaceNet data set to the public to enable developers and data scientists to help solve the road extraction problem.

Today, map features such as roads, building footprints, and points of interest are primarily created through manual techniques. We believe that advancing automated feature extraction techniques will serve important downstream uses of map data including humanitarian and disaster response, as observed by the need to map road networks during the response to recent flooding in Bangladesh and Hurricane Maria in Puerto Rico. Furthermore, we think that solving this challenge is an important stepping stone to unleashing the power of advanced computer vision algorithms applied to a variety of remote sensing data applications in both the public and private sectors.

Objective

Can you help us automate mapping? In this challenge, competitors are tasked with finding automated methods for extracting map-ready road networks from high-resolution satellite imagery. Moving towards more accurate fully automated extraction of road networks will help bring innovation to computer vision methodologies applied to high-resolution satellite imagery, and ultimately help create better maps where they are needed most.

Your task will be to extract navigable road networks that represent roads from satellite images. The linestrings your algorithm returns will be compared to ground truth data, and the quality of your solution will be judged by the Average Path Length Similarity (APLS) metric. See Scoring for details.

Input Files

Satellite images

Four types of images are available for the target areas:

  1. PAN: panchromatic (single channel, 16-bit grayscale, ~30 cm resolution)
  2. MUL: 8-band multi-channel (8*16-bit, ~1.2m resolution). This is the equivalent of the 8-band images from the first competition.
  3. RGB-PanSharpen: pan-sharpened version of Red-Green-Blue bands from the multispectral product (3 channels, 3*16-bit, ~30 cm resolution). This is the equivalent of the 3-band images from the first competition. This is formed by using the PAN image to interpolate 3 bands of the MUL dataset to increase the resolution of the Red, Green and Blue bands.
  4. MUL-PanSharpen: pan-sharpened version of MUL (8 channels, 8*16 bit, ~30 cm resolution)

The images were collected by the DigitalGlobe Worldview-3 satellite over Las Vegas, Paris, Shanghai and Khartoum. The training data contains more than 2780 images, each image covers 400m x 400m on the ground. Worldview-3 is sensitive to light in a wide range of wavelengths, see the specification of the frequency bands here. This extended range of spectral bands allows Worldview-3 imagery to be used to classify the material that is being imaged.

Images are provided in GeoTiff format. Since most image viewer applications will not be able to display 16-bit images and 8-band images, we provide a visualizer tool that you can use to look at the data. See the Visualizer description for details.

Note that the format and scene content of the images are the same as in the 2nd round of the Spacenet Challenge, but the images are different, the current challenge uses larger tiles than the previous round.

Road networks

The location and shape of known roads are referred to as 'ground truth' in this document. These data are described in CSV files using the following format:

ImageId,WKT_Pix
img1,"LINESTRING (1159.4 1006.1, 940 1011.9, 661.7 1002, 365.8 980.8)"
img1,"LINESTRING (661.7 1002, 670.5 1058.5)"
img999,LINESTRING EMPTY
  • ImageId is a (case sensitive) string that uniquely identifies the image.
  • WKT_Pix specifies the points and edges that represents a road network in Well Known Text format. Only the LINESTRING object type of the WKT standard is supported. The coordinate values represent pixels, the origin of the coordinate system is at the top left corner of the image, the first coordinate is the x value (positive is right), the second coordinate is the y value (positive is down). The coordinates refer to the PAN images. Note that because of the different resolutions, the coordinates of the same object on the MUL images are different.

Notes on road network definition

  • A LineString is a sequence of points given as {X Y} tuples. Each LineString contains at least 2 points with the exception of the special EMPTY construct, which is used for images showing no roads at all. Each consecutive pair of points in a LineString represents a straight road section between two points.
  • The points that represent road junctions are explicitly listed as elements of a LineString, and they occur more than once either in multiple LineStrings and/or in the same LineString. An example is the point (661.7 1002) in the sample above, which is present in both LineStrings for img1. If a LineString self-crosses or two LineString cross each other without the location of the crossing being explicitly listed as a point then this does NOT define a junction, this represents a situation of one of the roads running above the other one.
  • In this challenge all roads are undirected.
  • There are multiple ways of describing the same road network. For example this network
      A---B---C
      |   |   |
      D---E---F
    
    can be described with a single LineString (showing node labels instead of coordinates for better readability):
      LINESTRING (B, C, F, E, B, A, D, E)
    
    or equivalently by a set of two linestrings:
      LINESTRING (A, B, E, D, A)
      LINESTRING (B, C, F, E)
    
    or in several other ways.

Notes on data

  • Many tiles contain regions from where no image and no road network data is available. Such regions are shown in black in the visualizer tool. There are even tiles where the ratio of such black regions is close to 100%. Your algorithm should handle such cases of missing data.
  • The ground truth data was created manually and it is of high quality. Nevertheless, as in all real life problems, it may contain errors. Also you may annotate roads differently than how the annotators did.

Downloads

Input files are available for download from the spacenet-dataset AWS bucket. A separate guide is available that details the process of obtaining the data. See also this page for description of the Spacenet AWS data and download instructions.

Note that the same bucket holds data for the previous SpaceNet challenges as well, you need only a subset of the bucket content now: download only the files in the SpaceNet_Roads_Competition directory. You will need the following files:

  • SpaceNet_Roads_Sample.tar.gz (728 MB)
  • AOI_2_Vegas_Roads_Train.tar.gz (24.3 GB)
  • AOI_2_Vegas_Roads_Test_Public.tar.gz (8.1 GB)
  • AOI_3_Paris_Roads_Train.tar.gz (5.5 GB)
  • AOI_3_Paris_Roads_Test_Public.tar.gz (1.8 GB)
  • AOI_4_Shanghai_Roads_Train.tar.gz (24.0 GB)
  • AOI_4_Shanghai_Roads_Test_Public.tar.gz (8.0 GB)
  • AOI_5_Khartoum_Roads_Train.tar.gz (4.9 GB)
  • AOI_5_Khartoum_Roads_Test_Public.tar.gz (1.6 GB)

Where

  • SpaceNet_Roads_Sample.tar.gz contains a few sample training images from all 4 cities. Use this if you want to get familiar with the data without having to download any of the large files. Also this is the data you should use when making example submissions.
  • AOI_<n>_<city>_Roads_Train.tar.gz is the training set that belongs to a given city. It contains imagery in the 4 formats described above and also contains the road networks in a CSV file, see the summaryData/ folder.
  • AOI_<n>_<city>_Roads_Test_Public.tar.gz is the testing set that belongs to a given city. It contains imagery but does not contain the road network.

A sample submission file that scores non-zero on the SpaceNet_Roads_Sample dataset is available here.

A note on image ids and image names

The format of an imageId is

AOI_<n>_<city>_img<i>

where the <n>_<city> part can take one of the following values: 2_Vegas, 3_Paris, 4_Shanghai, 5_Khartoum. <i> is a 1-based integer that is unique for an image within a set that belongs to a city. Images within the PAN, MUL, etc folders have names like PAN_<imageId>.tif, MUL_<imageId>.tif, etc. All imageIds are case sensitive.

Output File

Your output must be a CSV file with identical format to the road network definition files described previously.

ImageId,WKT_Pix
        

The required fields are:

  • ImageId is a (case sensitive) string that uniquely identifies the image.
  • WKT_Pix specifies the points and edges that represent the road network you found. The format is exactly the same as given above in the Input files section. Important to know that the coordinates must be given in the scale of the PAN images. So if you find a road junction at (40, 20) on the PAN image and at (10, 5) on the corresponding MUL image then your output file should have a (40 20) coordinate pair listed in one or more of the shape definition LineStrings.

Constraints

  • A single file must contain road network definitions for ALL images in the test set.
  • The file may (and typically does) contain multiple lines for the same ImageId.
  • If you found no roads on an image then you must use the LINESTRING EMPTY construct. In this case your file must not contain other lines for the same ImageId.
  • The same road section (defined as consecutive points in a LineString) must not be listed more than once for the same image. I.e. points P1 and P2 must appear next to each other (in any order) in the LineStrings only once for the same image.
  • All road sections must have non-zero length, i.e. consecutive points in the LineStrings must be different.
  • Your output must be a single file with .csv extension. Optionally the file may be zipped, in which case it must have .zip extension. The file must not be larger than 50MB and must not contain more than 1 million lines.

Functions

This match uses the result submission style, i.e. you will run your solution locally using the provided files as input, and produce a CSV or ZIP file that contains your answer.

In order for your solution to be evaluated by Topcoder's marathon system, you must implement a class named RoadDetector, which implements a single function: getAnswerURL(). Your function will return a String corresponding to the URL of your submission file. You may upload your files to a cloud hosting service such as Dropbox or Google Drive, which can provide a direct link to the file.

To create a direct sharing link in Dropbox, right click on the uploaded file and select share. You should be able to copy a link to this specific file which ends with the tag "?dl=0". This URL will point directly to your file if you change this tag to "?dl=1". You can then use this link in your getAnswerURL() function.

If you use Google Drive to share the link, then please use the following format: "https://drive.google.com/uc?export=download&id=" + id

Note that Google has a file size limit of 25MB and can't provide direct links to files larger than this. (For larger files the link opens a warning message saying that automatic virus checking of the file is not done.)

You can use any other way to share your result file, but make sure the link you provide opens the filestream directly, and is available for anyone with the link (not only the file owner), to allow the automated tester to download and evaluate it.

An example of the code you have to submit, using Java:

public class RoadDetector  {
  public String getAnswerURL() {
    //Replace the returned String with your submission file's URL
    return "https://drive.google.com/uc?export=download&id=XYZ";
  }
}

Keep in mind that your complete code that generates these results will be verified at the end of the contest if you achieve a score in the top 10, as described later in the "Requirements to Win a Prize" section, i.e. participants will be required to provide fully automated executable software to allow for independent verification of the performance of your algorithm and the quality of the output data.

Scoring

A full submission will be processed by the Topcoder Marathon test system, which will download, validate and evaluate your submission file.

Any malformed or inaccessible file, or one that violates any of the constraints listed in the "Output file" section will receive a zero score.

If your submission is valid, your solution will be scored using the APLS algorithm. The main steps of the algorithm are described below, but there are many finer details that are not given here. For a definitive and detailed algorithm see the source code of the visualizer tool.

For each test tile a tile-level route difference score is calculated as follows.

  1. Let T be the ground truth graph representing the road network on the tile, and P be your predicted graph for the same tile. These graphs are created from the LineString-based representations by treating each point present in the LineString as a graph node and treating each line segment (consecutive pair of points in the LineString) as an edge between the two nodes. Note that these graphs are different from the graph concept used in mathematics because a) the nodes represent physical locations and b) the edges represent more than the presence of a connection between nodes: they have shape (a sequence of straight line segments).
  2. If both graphs are empty (contain no edges) then the score is 1 (the best possible tile-level score). If only one of the graphs is empty then the score is 0 (the worst possible tile-level score). Otherwise all the following steps are executed.
  3. Both graphs are simplified and smoothed.
    • Simplification means that nodes with 2 neighbours are removed. If node B is connected only to nodes A and C then B is removed and a new edge between A and C is created in a way that the shape of the route between A and C remains the same as it was before this step. After this step is repeated for all 2-connected nodes, only road junctions and road endpoints will remain as nodes, but the original shape of the network (the set of line segments) is kept unchanged. (A special case is a cycle: a subgraph which contains only 2-connected nodes and no other connections. Here we keep all of the nodes even if they are 2-connected.)
    • Smoothing means that extra nodes are inserted into the graph to make sure that no edges are longer than 50 meters. For example if the length of an edge between A and B is 120 meters (this is not the Euclidian distance, this is measured along the path between the two end points, in the most extreme case A and B may be the same), and we don't want to allow edges longer than 50 meters, then 2 extra nodes will be inserted: at 40 and 80 meters from node A.
  4. The APLS score between two graphs G1 and G2 is calculated as follows.
    • Create a copy of graph G2, call this G2'. Inject all nodes of G1 into G2'. This means that for a node n in G1 we find the closest point in G2' and create a node n' there. (Note that such a n' may not correspond to an existing node in G2', it may be created on an edge.) The Euclidian distance between n and n' must be smaller than 4 meters, otherwise no n' will be injected for this given n.
    • Aggregate a total path difference score by listing all valid routes (p1->p2) in G1 and checking what happens with corresponding routes in G2' (p1'->p2'). A route from p1 to p2 is valid if there exists a path between them in G1. (A path can be a direct connection between the two nodes or a longer sequence of edges that start from p1 and ends in p2.)
      • If there is no matching p1' in G2' then add a difference score of 1 for all valid routes there are in G1 starting from p1.
      • If there is a matching p1' but no matching p2' then also add a difference score of 1.
      • If there are matching p1' and p2' but there is no route between them in G2' then also add a difference score of 1.
      • If there are matching p1' and p2' and they are connected in G2' then add a difference score calculated as
         min(1, abs(d - d') / d),
        
        where d and d' are the length of the paths connecting the two points in G1 and G2', respectively.
    • Let avgDiff be the total difference score divided by the number of valid (p1->p2) routes in G1.
    • The APLS score between G1 and G2 is 1 - avgDiff.
  5. Calculate the APLS score between T and P.
  6. Repeat the APLS score calculation in the other direction: from P to T.
  7. The tile-level score is taken as the harmonic mean of the two scores calculated in the two previous steps.

Finally tile-level scores are averaged for each city, then globalAvgAPLS is taken as the average of these 4 city level scores.

Your overall score is calculated as 1000000 * globalAvgAPLS.

Example submissions can be used to verify that your chosen approach to upload submissions works. The tester will verify that the returned String contains a valid URL, its content is accessible, i.e. the tester is able to download the file from the returned URL. If your file is valid, it will be evaluated, and detailed scores will be available in the test results. The example evaluation is based on a small subset of the training data, the contents of the SpaceNet_Roads_Sample.tar.gz file is used for this purpose. Though recommended, it is not mandatory to create example submissions. The scores you achieve on example submissions have no effect on your provisional or final ranking. Example submissions can be created using the "Test Examples" button on TopCoder's submission uploader interface.

Full submissions must contain in a single file all the extracted road networks that your algorithm found in all images of the AOI_<n>_<city>_Roads_Test_Public folders, i.e. all images of all 4 cities should be processed into a single submission file. Full submissions can be created using the "Submit" button on TopCoder's submission uploader interface.

Notes on scoring

  • All length measurements are done in pixel space, using a constant pixel size of 0.31 meters. For example the maximum edge length of 50 meters means a 50/0.31 (~161.3) pixel distance.
  • The scorer treats points closer to each other than 0.1 pixels as equal. It is not recommended to place points close to each other, the connectivity of the graph may change in a nondeterministic way in this case.

Final Scoring

The top 10 competitors after the provisional testing phase will be invited to the final testing round. Within 10 days after the provisional testing phase you are required to submit a dockerized version of your code that we can use to test your system. The technical details of this process are described in a separate document.

Your solution will be subjected to three tests:

First, your solution will be validated (i.e. we will check if it produces the same output file as your last submission, using the same input files used in this contest). Note that this means that your solution must not be improved further after the provisional submission phase ends. (We are aware that it is not always possible to reproduce the exact same results. E.g., if you do online training then the difference in the training environments may result in different number of iterations, meaning different models. Also you may have no control over random number generation in certain 3rd party libraries. In any case, the results must be statistically similar, and in case of differences you must have a convincing explanation why the same result can not be reproduced.)

Second, your solution will be tested against a set of new image files. The number and size of these new set of images will be similar to the one you downloaded as testing data. Also the scene content will be similar.

Third, the resulting output from the steps above will be validated and scored. The final rankings will be based on this score alone.

Competitors who fail to provide their solution as expected will receive a zero score in this final scoring phase, and will not be eligible to win prizes.

Additional Resources

  • A visualizer is available here that you can use to test your solution locally. It displays satellite images, your extracted road networks and the expected ground truth. It also calculates detailed scores so it serves as an offline tester. (But note that the visualizer does not enforce the limits on allowed file size and number of lines.)
  • Further details on the scoring metric and also a Python implementation can be found here.

General Notes

  • This match is NOT rated.
  • Teaming is allowed. Topcoder members are permitted to form teams for this competition. After forming a team, Topcoder members of the same team are permitted to collaborate with other members of their team. To form a team, a Topcoder member may recruit other Topcoder members, and register the team by completing this Topcoder Teaming Form. Each team must declare a Captain. All participants in a team must be registered Topcoder members in good standing. All participants in a team must individually register for this Competition and accept its Terms and Conditions prior to joining the team. Team Captains must apportion prize distribution percentages for each teammate on the Teaming Form. The sum of all prize portions must equal 100%. The minimum permitted size of a team is 1 member, the maximum permitted team size is 5 members. Only team Captains may submit a solution to the Competition. Topcoder members participating in a team will not receive a rating for this Competition. Notwithstanding Topcoder rules and conditions to the contrary, solutions submitted by any Topcoder member who is a member of a team on this challenge but is not the Captain of the team are not permitted, are ineligible for award, may be deleted, and may be grounds for dismissal of the entire team from the challenge. The deadline for forming teams is 11:59pm ET on the 21th day following the date that Registration and Submission opens as shown on the Challenge Details page. Topcoder will prepare a Teaming Agreement for each team that has completed the Topcoder Teaming Form, and distribute it to each member of the team. Teaming Agreements must be electronically signed by each team member to be considered valid. All Teaming Agreements are void, unless electronically signed by all team members by 11:59pm ET of the 28th day following the date that Registration and Submission opens as shown on the Challenge Details page. Any Teaming Agreement received after this period is void. Teaming Agreements may not be changed in any way after signature. The registered teams will be listed in the contest forum thread titled "Registered Teams".
  • Organizations such as companies may compete as one competitor if they are registered as a team and follow all Topcoder rules.
  • Relinquish - Topcoder is allowing registered competitors or teams to "relinquish". Relinquishing means the member will compete, and we will score their solution, but they will not be eligible for a prize. Once a person or team relinquishes, we post their name to a forum thread labeled "Relinquished Competitors". Relinquishers must submit their implementation code and methods to maintain leaderboard status.
  • In this match you may use any programming language and libraries, including commercial solutions, provided Topcoder is able to run it free of any charge. You may use open source languages and libraries provided they are equally free for your use, use by another competitor, or use by the client. If your solution requires licenses, you must have these licenses and be able to legally install them in a testing VM (see "Requirements to Win a Prize" section). Submissions will be deleted/destroyed after they are confirmed. Topcoder will not purchase licenses to run your code. Prior to submission, please make absolutely sure your submission can be run by Topcoder free of cost, and with all necessary licenses pre-installed in your solution. Topcoder is not required to contact submitters for additional instructions if the code does not run. If we are unable to run your solution due to license problems, including any requirement to download a license, your submission might be rejected. Be sure to contact us right away if you have concerns about this requirement.
  • If your solution includes licensed software (e.g. commercial software, open source software, etc), you must include the full license agreements with your submission. Include your licenses in a folder labeled "Licenses". Within the same folder, include a text file labeled README.txt that explains the purpose of each licensed software package as it is used in your solution.
  • You may use open source languages and libraries provided they are equally free for your use, use by another competitor, or use by the client.
  • External data sets and pre-trained models are allowed for use in the competition provided the following are satisfied:
    • The external data and pre-trained models are unencumbered with legal restrictions that conflict with its use in the competition.
    • The data source or data used to train the pre-trained models is defined in the submission description.
    • The external data source must be declared in the competition forum in the first 45 days of the competition to be eligible in a final solution. References and instructions on how to obtain are valid declarations (for instance in the case of license restrictions). If you want to use a certain external data source, post a question in the forum thread titled "Requested Data Sources". Contest stakeholders will verify the request and if the use of the data source is approved then it will be listed in the forum thread titled "Approved Data Sources".
  • Use the match forum to ask general questions or report problems, but please do not post comments and questions that reveal information about possible solution techniques.

Award details and requirements to Win a Prize

Final prizes

In order to receive a final prize, you must do all the following:

Achieve a score in the top five of the average score of all cities, according to final system test results. See the "Final scoring" section above.

Once the final scores are posted and winners are announced, the prize winner candidates have 7 days to submit a report outlining their final algorithm explaining the logic behind and steps to its approach. You will receive a template that helps creating your final report.

If you place in a prize winning rank but fail to do any of the above, then you will not receive a prize, and it will be awarded to the contestant with the next best performance who did all of the above.

Top undergraduate / top graduate award

The highest scoring undergraduate university student (someone actively pursuing a Bachelor’s degree), or team of such students is awarded $2,500. The same applies to the highest scoring graduate student (someone pursuing a Master’s or doctoral degree) or team of such students.

Both prizes are based on the rankings after the final tests. The top undergraduate / graduate prize and the final prizes are not exclusive, the same contestant / team may win a final prize and also one of these two awards. For teams to be eligible for one of these awards all team members must be eligible for the same award.

Eligibility

To be eligible to win the Graduate and Undergraduate prizes, individuals must provide proof of enrollment in an accredited degree program prior to the end of the challenge.

Employees of In-Q-Tel, Maxar Technologies (DigitalGlobe, Radiant Solutions, SSL, and MDA) and NVIDIA are allowed to participate in the contest but must forego monetary prizes.

 

Definition

    
Class:RoadDetector
Method:getAnswerURL
Parameters:
Returns:String
Method signature:String getAnswerURL()
(be sure your method is public)
    
 

Examples

0)
    
Test case 1

This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2010, TopCoder, Inc. All rights reserved.