JOIN
Get Time
long_comps_topcoder  Problem Statement
Contest: UKHO Tidal Streams F2F
Problem: TidalStream

Problem Statement

    

UKHO Tidal Stream Prediction First-to-Finish Contest Statement

Prize Distribution

 ---------------------- ------------
|   First Contestant   |            |
| to Obtain a Score of |    USD     |
 ---------------------- ------------
|      1,000,000       |   $6,000   |
 ---------------------- ------------

Contest Format

This is a first-to-finish data science contest. The first competitor to obtain both provisional and system test score of 1,000,000 as outlined in the “Scoring” section below will win $6,000, and the match will end.

Background

The alternating horizontal movement of water associated with the rise and fall of the tide caused by tide-producing forces is known as tidal stream. Tidal streams have measureable directions and speeds that vary regularly over time.

The physical locations of vessels at sea are tracked throughout their journeys, using the Automatic Identification System (“AIS”), which records their positions (latitude and longitude) over time. AIS data may provide an opportunity to calculate tidal stream direction and speed.

When a vessel is at anchor in open water, the ocean current moves it until its anchor chain or line is taut. Then, as the current changes, the vessel rotates slowly around its own anchorage center. The changes in the vessel’s position, which are recorded by the AIS measurements, trace out circular, arc-like, or elliptical shapes over time.

Objective

In the customary training and test data framework, create an algorithm that uses AIS data and data on vessel characteristics to predict tidal stream direction and speed.

Hints

We suggest you begin by creating a way to identify circular, arc-like, or elliptical shapes of vessels in motion around their own anchorage centers and filtering out data points which aren’t part of these particular patterns. We haven’t tried to identify these shapes, because that’s the central challenge of this contest. Having created such a way to identify them, next try to use the changes in the positions of anchored vessels over time and the variables described below to predict tidal stream direction and speed. You may find interpolation to be useful.

The mathematics of positions on the curved surface of the earth can be treacherous, as evidenced by libraries like GeographicLib. Wikipedia’s entry on the world geodetic system is definitely worth a look. See also Charles Karney’s paper on geodetic measurements.

Data Description

In the data available for this contest, each record contains 16 columns, belonging to a vessel, which are described below. Note that the measurements recorded in the sog, cog, and heading variables may be very inaccurate. Furthermore, rot, heading, beam, and length have missing values.

Index   Name            Type    Description                                                                 Values
-----   -------------   ------  -----------------------------------------------------------------------     --------------
1       arkposid        String  Record unique identifier                                                    15 char
2       mmsi            String  Marine mobile service (i.e., ship) identifier                               9 char
3       lon             Float   Longitude of a ship’s location (degrees from the Prime Meridian)            [-180, 180]
4       lat             Float   Latitude of a ship’s location (degrees from the Equator)                    [-90, 90]
5       sog             Float   Speed over ground (nautical miles per hour) - may be very inaccurate        [0.0, 30.0]
6       cog             Float   Course over ground (degrees from true north) - may be very inaccurate       [0.0, 359.9]
7       rot             Float   Ship's rate of turn (degrees)                                               [0.0, 30.0]
8       heading         Float   Ship's heading (degrees from true north) - may be very inaccurate           [0.0, 359.9]
9       beam            Float   Ship's beam (i.e., width at the widest point) (meters)                      [6.0, 28.0]
10      length          Float   Ship's length (meters)                                                      [16.0, 92.0]
11      draught         Float   Ship's draught (meters) See https://en.wikipedia.org/wiki/Draft_(hull).     [0.9, 5.5]
12      grosstonnage    Float   Ship's gross tonnage (proportional to volume of the ship and unitless)      [123.0, 5125.0]
13      shipname        String  Ship's name                                                                 Various
14      relative time   Int	 Relative time in minutes compared to first measurement in the set at which
        (minutes)               the variables lon through heading and direction and speed were recorded     [0, 220000]
15      direction       Float   Tidal stream direction (degrees from true north)                            [0.0, 359.9]
16      speed           Float   Tidal stream speed (meters per second)                                      [0.0, 1.0]

Data Sets

  • The full data set used in this contest contains almost 65,000 AIS observations.
  • We sorted the data by time and used the first ~60% as training data.
  • The next ~15% of the data was used for provisional tests, and the remaining ~25% for system tests.
  • For each test, approximately 40% of the data (from that segment) is selected for training, and approximately 30% for testing.
  • For provisional tests, all example data is also added to the training set.
  • For system tests, all example and approximately 40% of the provisional data is also added to the training set.
  • There are 3 provisional test cases and at least 2 system test cases.
  • The training data set is available here

Functions

You must implement a class TidalStream with only one function, predict, which receives the testType (0, 1, or 2, to indicate Example, Provisional or System test, respectively), trainingData, and testingData.

In the string[] trainingData, each string contains a single record, and has 16 tokens, comma-separated, in the same order as described in Data Description section above.

The format of testingData is the same as the trainingData, except for the absence of the last 2 columns, which are the variables to be predicted.

The returned double[] should contain the corresponding predictions for direction and speed, where element 2*x is the direction and element 2*x+1 is the speed predicted for element x of testingData. The length of the return array should be equal to twice the length of testingData.

Scoring

Define the following

Final score is the average of individual scores for all test cases within the test set.

Provisional test scores will appear on the leaderboard throughout the match, as they normally do. Once a submission obtains a provisional test score ≥ 1,000,000, we will perform system tests on that solution. The match will end as soon as a solution obtains a system test score ≥ 1,000,000.

Limits

  • Because different test types deal with different volumes of data, the time limits will also differ. Example tests are limited to 600s (10 minutes), provisional tests to 900s (15 minutes) and system tests to 1500s (25 minutes). The testType parameter will be 0, 1, or 2, to indicate Example, Provisional, or System test, respectively, so that your code can take timing into account.
  • The memory limit is 2048 megabytes.
  • There is no explicit code size limit. The implicit source code size limit is around 1 MB (it is not advisable to submit codes of size close to that or larger). The compilation time limit is 60 seconds.
  • You will be allowed to make example submissions each 30 minutes and provisional submissions each 3 hours.

General notes

  • This match is not rated.
  • The allowed programming languages are C++, Java, C#, Python and VB.
  • You can include open source code in your submission, provided it is free for you to use and would be for the client as well. Code from open source libraries under the BSD or MIT licenses will be accepted. Other open source licenses may be accepted too, just be sure to ask us.
  • The test servers have only the default installs of all languages, so no additional libraries will be available.
  • Use the match forum to ask general questions or report problems, but please do not post comments and questions that reveal information about the problem itself or possible solution techniques.

Requirements to Win a Prize

  1. Achieve a score of 1,000,000, according to system test results. See the “Scoring” section below.
  2. Within 7 days from the announcement of the challenge winners, submit a complete report at least 2 pages long containing at least the following sections:

    Your Information

    • First Name
    • Last Name
    • Topcoder handle
    • Email address

    Approach Used

    • Please describe your algorithm so that we know what you did even before seeing your code. Use line references to refer to specific portions of your code.

    This section must contain at least the following:

    • Approaches considered
    • Approach ultimately chosen
    • Steps to approach ultimately chosen, including references to specific lines of your code
    • Advantages and disadvantages of the approach chosen
    • Comments on open source resources used
    • Special guidance given to algorithm based on training
    • Potential improvements to your algorithm

    Local Programs Used

    • If any parameters were obtained from the training data set, you will also need to provide the program(s) used to generate these parameters.
    • There is no restriction on the programming language/libraries used to generate these training parameters, provided they are a free, open-source solution for you to use and would be for the client as well.

If you are the first to obtain a system test score ≥ 1,000,000 but fail to do any of the above, then you will not receive your prize until you have remedied that.

 

Definition

    
Class:TidalStream
Method:predict
Parameters:int, String[], String[]
Returns:double[]
Method signature:double[] predict(int testType, String[] trainingData, String[] testingData)
(be sure your method is public)
    
 

Examples

0)
    
Seed: 1

This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2010, TopCoder, Inc. All rights reserved.