UKHO Tidal Stream Prediction First-to-Finish Contest Statement
| First Contestant | |
| to Obtain a Score of | USD |
| 1,000,000 | $6,000 |
This is a first-to-finish data science contest. The first competitor to obtain both provisional and system test score of 1,000,000 as outlined in the “Scoring” section below will win $6,000, and the match will end.
The alternating horizontal movement of water associated with the rise and fall of the tide caused by tide-producing forces is known as tidal stream. Tidal streams have measureable directions and speeds that vary regularly over time.
The physical locations of vessels at sea are tracked throughout their journeys, using the Automatic Identification System (“AIS”), which records their positions (latitude and longitude) over time. AIS data may provide an opportunity to calculate tidal stream direction and speed.
When a vessel is at anchor in open water, the ocean current moves it until its anchor chain or line is taut. Then, as the current changes, the vessel rotates slowly around its own anchorage center. The changes in the vessel’s position, which are recorded by the AIS measurements, trace out circular, arc-like, or elliptical shapes over time.
In the customary training and test data framework, create an algorithm that uses AIS data and data on vessel characteristics to predict tidal stream direction and speed.
We suggest you begin by creating a way to identify circular, arc-like, or elliptical shapes of vessels in motion around their own anchorage centers and filtering out data points which aren’t part of these particular patterns. We haven’t tried to identify these shapes, because that’s the central challenge of this contest. Having created such a way to identify them, next try to use the changes in the positions of anchored vessels over time and the variables described below to predict tidal stream direction and speed. You may find interpolation to be useful.
The mathematics of positions on the curved surface of the earth can be treacherous, as evidenced by libraries like GeographicLib. Wikipedia’s entry on the world geodetic system is definitely worth a look. See also Charles Karney’s paper on geodetic measurements.
In the data available for this contest, each record contains 16 columns, belonging to a vessel, which are described below. Note that the measurements recorded in the sog, cog, and heading variables may be very inaccurate. Furthermore, rot, heading, beam, and length have missing values.
Index Name Type Description Values
----- ------------- ------ ----------------------------------------------------------------------- --------------
1 arkposid String Record unique identifier 15 char
2 mmsi String Marine mobile service (i.e., ship) identifier 9 char
3 lon Float Longitude of a ship’s location (degrees from the Prime Meridian) [-180, 180]
4 lat Float Latitude of a ship’s location (degrees from the Equator) [-90, 90]
5 sog Float Speed over ground (nautical miles per hour) - may be very inaccurate [0.0, 30.0]
6 cog Float Course over ground (degrees from true north) - may be very inaccurate [0.0, 359.9]
7 rot Float Ship's rate of turn (degrees) [0.0, 30.0]
8 heading Float Ship's heading (degrees from true north) - may be very inaccurate [0.0, 359.9]
9 beam Float Ship's beam (i.e., width at the widest point) (meters) [6.0, 28.0]
10 length Float Ship's length (meters) [16.0, 92.0]
11 draught Float Ship's draught (meters) See https://en.wikipedia.org/wiki/Draft_(hull). [0.9, 5.5]
12 grosstonnage Float Ship's gross tonnage (proportional to volume of the ship and unitless) [123.0, 5125.0]
13 shipname String Ship's name Various
14 relative time Int Relative time in minutes compared to first measurement in the set at which
(minutes) the variables lon through heading and direction and speed were recorded [0, 220000]
15 direction Float Tidal stream direction (degrees from true north) [0.0, 359.9]
16 speed Float Tidal stream speed (meters per second) [0.0, 1.0]
- The full data set used in this contest contains almost 65,000 AIS observations.
- We sorted the data by time and used the first ~60% as training data.
- The next ~15% of the data was used for provisional tests, and the remaining ~25% for system tests.
- For each test, approximately 40% of the data (from that segment) is selected for training, and approximately 30% for testing.
- For provisional tests, all example data is also added to the training set.
- For system tests, all example and approximately 40% of the provisional data is also added to the training set.
- There are 3 provisional test cases and at least 2 system test cases.
- The training data set is available here
You must implement a class TidalStream with only one function, predict, which receives the testType (0, 1, or 2, to indicate Example, Provisional or System test, respectively), trainingData, and testingData.
In the string trainingData, each string contains a single record, and has 16 tokens, comma-separated, in the same order as described in Data Description section above.
The format of testingData is the same as the trainingData, except for the absence of the last 2 columns, which are the variables to be predicted.
The returned double should contain the corresponding predictions for direction and speed, where element 2*x is the direction and element 2*x+1 is the speed predicted for element x of testingData. The length of the return array should be equal to twice the length of testingData.
Define the following
Final score is the average of individual scores for all test cases within the test set.
Provisional test scores will appear on the leaderboard throughout the match, as they normally do. Once a submission obtains a provisional test score ≥ 1,000,000, we will perform system tests on that solution. The match will end as soon as a solution obtains a system test score ≥ 1,000,000.
- Because different test types deal with different volumes of data, the time limits will also differ. Example tests are limited to 600s (10 minutes), provisional tests to 900s (15 minutes) and system tests to 1500s (25 minutes). The testType parameter will be 0, 1, or 2, to indicate Example, Provisional, or System test, respectively, so that your code can take timing into account.
- The memory limit is 2048 megabytes.
- There is no explicit code size limit. The implicit source code size limit is around 1 MB (it is not advisable to submit codes of size close to that or larger). The compilation time limit is 60 seconds.
- You will be allowed to make example submissions each 30 minutes and provisional submissions each 3 hours.
- This match is not rated.
- The allowed programming languages are C++, Java, C#, Python and VB.
- You can include open source code in your submission, provided it is free for you to use and would be for the client as well. Code from open source libraries under the BSD or MIT licenses will be accepted. Other open source licenses may be accepted too, just be sure to ask us.
- The test servers have only the default installs of all languages, so no additional libraries will be available.
- Use the match forum to ask general questions or report problems, but please do not post comments and questions that reveal information about the problem itself or possible solution techniques.
Requirements to Win a Prize
- Achieve a score of 1,000,000, according to system test results. See the “Scoring” section below.
Within 7 days from the announcement of the challenge winners, submit a complete report at least 2 pages long containing at least the following sections:
- First Name
- Last Name
- Topcoder handle
- Email address
- Please describe your algorithm so that we know what you did even before seeing your code. Use line references to refer to specific portions of your code.
This section must contain at least the following:
- Approaches considered
- Approach ultimately chosen
- Steps to approach ultimately chosen, including references to specific lines of your code
- Advantages and disadvantages of the approach chosen
- Comments on open source resources used
- Special guidance given to algorithm based on training
- Potential improvements to your algorithm
Local Programs Used
- If any parameters were obtained from the training data set, you will also need to provide the program(s) used to generate these parameters.
- There is no restriction on the programming language/libraries used to generate these training parameters, provided they are a free, open-source solution for you to use and would be for the client as well.
If you are the first to obtain a system test score ≥ 1,000,000 but fail to do any of the above, then you will not receive your prize until you have remedied that.