Requirements to Win a Prize
In order to receive a prize, you must do all the following:
- Achieve a score in the top 5, according to system test results. See the “Scoring” section below.
- Create a legitimate algorithm that runs successfully on a different data set with the same fields.
Hard-coded solutions are unacceptable.
- Within 7 days from the end of the challenge, submit a complete report at least 2 pages long
outlining your final algorithm, explaining the logic behind and steps to its approach. The required
content and format appear in the “Report” section below.
- Within 7 days of the end of the challenge, submit all code used in your final algorithm in an
appropriately named file (or tar or zip archive). We will contact the winners via email and ask
for the file. The naming convention should be memberHandle-FishingForFishermenContest. For example,
handle “johndoe” would name his submission “johndoe-FishingForFishermenContest.”
- You will also be asked to setup your code on a VM, with appropriate scripts
(e.g. ./train.sh and ./run.sh) for its execution.
We wish to identify the type of fishing being performed by a vessel, based on AIS broadcast
reports and contextual data. In this second round of the challenge, we will be building upon the
first round a bit, by now attempting to further classify based upon the type of fishing taking place.
Create an algorithm to effectively identify if a vessel is fishing--and if so, the type of
fishing taking place--based on observable behavior and any additional data such as weather, known
fishing grounds, etc. regardless of vessel type or declared purpose.
Your algorithm shall utilize AIS positional data, and combined oceanographic data as provided in
the downloadable data set.
Your algorithm should then detect vessels that match the profile of behaviors of vessels engaged
in fishing. You must identify what type of fishing each vessel is involved in during each of the
tracks in the data set.
The data is provided as a set of CSV files, one for each vessel track, as well as a ground truth
file (for the training data) indicating the type of fishing being performed on each track.
The vessel track data contains the following fields:
- Track Number
- Relative Time (seconds from the track start)
- Latitude (degrees to the north)
- Longitude (degrees to the east)
- SOG (Speed Over Ground, knots)
- Oceanic Depth (meters)
- Chlorophyll Concentration (milligrams per cubic meter)
- Salinity (Practical Salinity Units)
- Water Surface Elevation (meters)
- Sea Temperature (degrees)
- Thermocline Depth (meters)
- Eastward Water Velocity (meters per second)
- Northward Water Velocity (meters per second)
The ground truth file for training data will contain the following fields:
- Track Number
- Fishing Type
The fishing type will not be included for the testing data.
During the contest, only your results will be submitted. You will submit code which implements
only one function, getAnswerURL(). Your function will return a String corresponding to the URL of
your answer .csv file. You may upload your .csv file to a cloud hosting service (such as Dropbox)
which can provide a direct link to your .csv file.
To create a direct sharing link in Dropbox, right click on the uploaded file and select share.
You should be able to copy a link to this specific file which ends with the tag "?dl=0". This URL
will point directly to your file if you change this tag to "?dl=1". You can then use this link in
your getAnswerURL() function. Your complete code that generates these results will be tested at the
end of the contest before prize distribution.
The answer CSV file should contain several rows, each of the form “Track#, FishingType, Prob”.
The probability for any combination of track number and fishing type not specified in the answer
CSV will be assumed to be 0.0. Any value greater than 1.0 will be treated as 1.0, and values less
than 0.0 will be treated as 0.0.
Your predictions will be scored against the ground truth using the area under the receiver operating characteristic (ROC). Some of the records in the test set will be used for provisional scoring, and others for system test scoring. (You will not know which records belong to which set.)
The ROC curve will be determined and the score will be determined from the area under the ROC curve using the following method:
- The contestant's submission will score each track with a probability that the vessel was
engaged in each type of fishing.
- Each fishing type will be treated as a binary classifier, and it’s AuC will be calculated as
in steps 3-5.
- The true positive rates and the false positive rates are determined as follows:
where N_TRP is the total number of fishing records of the given type, N_FPR is the total number
of records not of that fishing type, and N_TRP + N_FPR = N (total number of records with known
status in the test)
- TPR_i = Accumulate[s_i] / N_TPR
- FPR_i = Accumulate[1 - s_i] / N_FPR;
- with the addition: FPR_0 = 0;
- Then the AuC is determined as a numerical integral of TRP over FRP:
AuC = Sum [TPR_i * (FPR_i - FPR_i-1)]
- Then the four AuC values are weighted and averaged, then scaled to determine the final score:
Score = max(1,000,000 * (2 * WeightedAverage - 1), 0)
The weights are as follows:
- trawler: 40%
- seiner: 30%
- longliner: 20%
- support: 10%
The following links are included, as they were all listed with the previous "Fishing for
Fishermen" contest. Though they are not all strictly necessary or directly useful in this iteration,
we have left them included for anyone with general interest in subject.
Useful site for the entire AIS message (each line of data to be provided is an AIS "sentence"):
ITU document describing “payload” field (field 6 of each message):
How to calculate the NMEA checksum:
The distance between 2 points, given their longitude and latitude, can be calculated using the
Your report must be at least 2 pages long, contain at least the following sections, and use
the section and bullet names below.
This section must contain at least the following:
- First Name
- Last Name
- Topcoder handle
- Email address
- Final code submission file name
Please describe your algorithm so that we know what you did even before seeing your code. Use
line references to refer to specific portions of your code.
This section must contain at least the following:
- Approaches considered
- Approach ultimately chosen
- Steps to approach ultimately chosen, including references to specific lines of your code
- Open source resources and tools used, including URLs and references to specific lines of your code
- Advantages and disadvantages of the approach chosen
- Comments on libraries
- Comments on open source resources used
- Special guidance given to algorithm based on training
- Potential improvements to your algorithm