Get Time
long_comps_topcoder  Problem Statement
Contest: Fishing MM 2
Problem: FishingForFishermen2

Problem Statement


Prize Distribution

  1. $10,000
  2. $7,000
  3. $5,000
  4. $3,000
  5. $1,000

Requirements to Win a Prize

In order to receive a prize, you must do all the following:

  • Achieve a score in the top 5, according to system test results. See the “Scoring” section below.
  • Create a legitimate algorithm that runs successfully on a different data set with the same fields.

    Hard-coded solutions are unacceptable.
  • Within 7 days from the end of the challenge, submit a complete report at least 2 pages long outlining your final algorithm, explaining the logic behind and steps to its approach. The required content and format appear in the “Report” section below.
  • Within 7 days of the end of the challenge, submit all code used in your final algorithm in an appropriately named file (or tar or zip archive). We will contact the winners via email and ask for the file. The naming convention should be memberHandle-FishingForFishermenContest. For example, handle “johndoe” would name his submission “johndoe-FishingForFishermenContest.”
  • You will also be asked to setup your code on a VM, with appropriate scripts (e.g. ./ and ./ for its execution.


We wish to identify the type of fishing being performed by a vessel, based on AIS broadcast reports and contextual data. In this second round of the challenge, we will be building upon the first round a bit, by now attempting to further classify based upon the type of fishing taking place.


Create an algorithm to effectively identify if a vessel is fishing--and if so, the type of fishing taking place--based on observable behavior and any additional data such as weather, known fishing grounds, etc. regardless of vessel type or declared purpose.

Your algorithm shall utilize AIS positional data, and combined oceanographic data as provided in the downloadable data set.

Your algorithm should then detect vessels that match the profile of behaviors of vessels engaged in fishing. You must identify what type of fishing each vessel is involved in during each of the tracks in the data set.

Data Description

The data is provided as a set of CSV files, one for each vessel track, as well as a ground truth file (for the training data) indicating the type of fishing being performed on each track.

The vessel track data contains the following fields:

  • Track Number
  • Relative Time (seconds from the track start)
  • Latitude (degrees to the north)
  • Longitude (degrees to the east)
  • SOG (Speed Over Ground, knots)
  • Oceanic Depth (meters)
  • Chlorophyll Concentration (milligrams per cubic meter)
  • Salinity (Practical Salinity Units)
  • Water Surface Elevation (meters)
  • Sea Temperature (degrees)
  • Thermocline Depth (meters)
  • Eastward Water Velocity (meters per second)
  • Northward Water Velocity (meters per second)

The ground truth file for training data will contain the following fields:

  • Track Number
  • Fishing Type

The fishing type will not be included for the testing data.


During the contest, only your results will be submitted. You will submit code which implements only one function, getAnswerURL(). Your function will return a String corresponding to the URL of your answer .csv file. You may upload your .csv file to a cloud hosting service (such as Dropbox) which can provide a direct link to your .csv file.

To create a direct sharing link in Dropbox, right click on the uploaded file and select share. You should be able to copy a link to this specific file which ends with the tag "?dl=0". This URL will point directly to your file if you change this tag to "?dl=1". You can then use this link in your getAnswerURL() function. Your complete code that generates these results will be tested at the end of the contest before prize distribution.

The answer CSV file should contain several rows, each of the form “Track#, FishingType, Prob”. The probability for any combination of track number and fishing type not specified in the answer CSV will be assumed to be 0.0. Any value greater than 1.0 will be treated as 1.0, and values less than 0.0 will be treated as 0.0.


Your predictions will be scored against the ground truth using the area under the receiver operating characteristic (ROC). Some of the records in the test set will be used for provisional scoring, and others for system test scoring. (You will not know which records belong to which set.)

The ROC curve will be determined and the score will be determined from the area under the ROC curve using the following method:

  1. The contestant's submission will score each track with a probability that the vessel was engaged in each type of fishing.
  2. Each fishing type will be treated as a binary classifier, and it’s AuC will be calculated as in steps 3-5.
  3. The true positive rates and the false positive rates are determined as follows:
    • TPR_i = Accumulate[s_i] / N_TPR
    • FPR_i = Accumulate[1 - s_i] / N_FPR;
    • with the addition: FPR_0 = 0;
    where N_TRP is the total number of fishing records of the given type, N_FPR is the total number of records not of that fishing type, and N_TRP + N_FPR = N (total number of records with known status in the test)
  4. Then the AuC is determined as a numerical integral of TRP over FRP:

    AuC = Sum [TPR_i * (FPR_i - FPR_i-1)]
  5. Then the four AuC values are weighted and averaged, then scaled to determine the final score:

    Score = max(1,000,000 * (2 * WeightedAverage - 1), 0)
    The weights are as follows:

    • trawler: 40%
    • seiner: 30%
    • longliner: 20%
    • support: 10%


The following links are included, as they were all listed with the previous "Fishing for Fishermen" contest. Though they are not all strictly necessary or directly useful in this iteration, we have left them included for anyone with general interest in subject.

Useful site for the entire AIS message (each line of data to be provided is an AIS "sentence"):

ITU document describing “payload” field (field 6 of each message):

How to calculate the NMEA checksum:

The distance between 2 points, given their longitude and latitude, can be calculated using the Haversine formula.


Your report must be at least 2 pages long, contain at least the following sections, and use the section and bullet names below.

Your Information

This section must contain at least the following:

  • First Name
  • Last Name
  • Topcoder handle
  • Email address
  • Final code submission file name

Approach Used

Please describe your algorithm so that we know what you did even before seeing your code. Use line references to refer to specific portions of your code.

This section must contain at least the following:

  • Approaches considered
  • Approach ultimately chosen
  • Steps to approach ultimately chosen, including references to specific lines of your code
  • Open source resources and tools used, including URLs and references to specific lines of your code
  • Advantages and disadvantages of the approach chosen
  • Comments on libraries
  • Comments on open source resources used
  • Special guidance given to algorithm based on training
  • Potential improvements to your algorithm


Method signature:String getAnswerURL()
(be sure your method is public)


Example Submission
This example case will download your submission file, and verify that all contents refer to valid vessel track numbers, fishing types, and that prediction values parse correctly. It will not perform any scoring; if you wish to split the traning data for your own offline testing, that should be done locally.

This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2010, TopCoder, Inc. All rights reserved.