In addition to the problem statement, some additional information is provided in this and this threads.
GPS does not work indoors and even in an outdoor environment is usually accurate to no more than 10 meters of actual path taken. Modern smartphone are equipped with accelerometers, gyroscopes, magnetometers and other sensors that can be used to approximate human motion and therefore determine an approximate location from a known start point, otherwise known as dead reckoning. The problem is that these consumer grade sensors are typically low quality and drift over time meaning that error will be introduced over time and eventually compound to the point where the derived position is no longer within the tolerance for accuracy. This is compounded by the fact that the user may hold the phone in an unknown and inconsistent orientation during movement.
This challenge asks you to try to deal with the mentioned problem. You will need to implement an online algorithm that will receive the data from a smartphone's sensors. At the end of each second the algorithm will need to calculate as precise estimate of the current user's location as possible.
The data in this problem consists of 17 routes. They are distributed into training, provisional and final test sets with 8, 3 and 6 routes, respectively. This was done as follows:
- 3 of the routes belong to the Northern Hemisphere. Of these, the shortest one is used for the provisional test set, the longest one for the training test set and the remaining route is used for the final test set.
- Of the 14 Southern Hemisphere routes, the longest one and the shortest one are used for training test set.
- Of the remaining 12 routes, the client has chosen 3 routes which are in their opinion the most representative ones. These are included to the final test set.
- Finally, the remaining 9 routes are split between training, provisional and final test sets in a random fashion.
The 8 routes from the training set can be downloaded here. Each folder corresponds to one route. Files ACCELEROMETER.txt, GYROSCOPE.txt, LIGHT.txt, LINEAR_ACCELERATION.txt, MAGNETIC_FIELD.txt, ORIENTATION.txt, PROXIMITY.txt (avaliable not for all routes), ROTATION_VECTOR.txt contain data collected from various sensors. First one or several lines explain the data, then the data itself follows, one entry per line. Some additional documentation about sensors is located here. File LOCATION.txt contains GPS information collected along the route. It will be used to evaluate the quality of your algorithm. (There is some inaccuracy in this data, but this is the most accurate representation of these routes that is available.) File LOCATION_KML.kml contains the same information, but in KML format. You can directly open it in Google Earth and see what the route looks like. Finally, files START_LOCATION.txt and START_LOCATION_KML.kml contain only the starting point of the route, i.e., the dead reckoning (in other words, this is the first data line from LOCATION.txt and LOCATION_KML.kml).
All data was collected on a Samsung Galaxy S2 smartphone using the Data Recording application located here. You can use this application to generate more training data, though if your phone is not Samsung Galaxy S2, it can reduce the applicability of this data to this challenge.
You will need to implement two methods:
- init. It will be called in the beginning of your program. The parameter startLocation contains the single data line from START_LOCATION.txt file. You can return any int from this method, the return value will be ignored.
- estimateLocation. It will be called once per each second of the route (in other words, once for each data line of LOCATION.txt file, except the start location line). The parameters accelerometer, gyroscope, light, linearAcceleration, magneticField, orientation, proximity, rotationVector will contain all the data from files ACCELEROMETER.txt, GYROSCOPE.txt, LIGHT.txt, LINEAR_ACCELERATION.txt, MAGNETIC_FIELD.txt, ORIENTATION.txt, PROXIMITY.txt, ROTATION_VECTOR.txt that was collected during this second, one data line per element. The return value should be an estimate of the user's location at the end of this second. It should be a double containing three elements in this order: longitude (degree, in -180..180 range), latitude (degrees, in -90..90 range) and altitude (meters, in -10000..10000 range).
We use two methods to score a route -- global similarity and local similarity. Let's assume the route contains N+1 points (M, M, ..., M[N]) and your algorithm estimates the route as (P = M, P, ..., P[N]). Let dist(A, B) be the Euclidean distance between points A and B (in meters). The global similarity is defined as the arithmetic average of dist(M[i], P[i]), 1 <= i <= N. The local similarity is defined as the arithmetic average of dist(M[i] - M[i-1], P[i] - P[i-1]), 1 <= i <= N. Here the difference between two points A(x1, y1, z1) and B(x2, y2, z2) is the point (x1-x2, y1-y2, z1-z2) where all coordinates are measured in the same Cartesian coordinate system. On intuitive level, global similarity measures how precisely you are able to estimate points of the route and local similarity measures the ability to estimate user's movement during each particular second.
NOTE: it can be seen in LOCATION.txt files that sometimes multiple consecutive lines have the same values of latitude, longitude and altitude. This happens because no GPS readings are available for some periods of time. In each such group of lines, only the first line contains valid information, all other lines are not valid. When many GPS readings are missing, the scores can become very significantly spoiled. In order to correct that, the scoring is done only based on lines with valid information. More details can be found here.
For each route we will create two test cases. They will be different only by scoring method. In one of them we will use the global similarity as the score and in the other one the local similarity. If your program is unable to process a test case (due to time limit, memory limit, crash, wrong return value format, etc.), your score will be reported as -1.
The scores are then normalized as follows: if YOUR is your score for a test case and BEST is the best (minimum) score for this test case (among last submissions of all competitors), then your normalized score is equal to BEST / YOUR. A score of -1 becomes 0 after normalization.
Your overall score on a group of test cases will be calculated as 200000.0 * (arithmetic average of normalized scores on all global similarity test cases) + 800000.0 * (arithmetic average of normalized scores on all local similarity test cases).
The scoring implementation models Earth as a precise sphere with a radius of 6371 kilometers.
You can include open source code in your submission provided that it is clearly identified in your code and the following conditions are all fulfilled:
- Such inclusion does not violate the match rules.
- Your solution can be used for commercial purposes.
- If your solution is included into a software product, this product can be distributed with undisclosed source code.
The client is seeking for a general solution of this problem. Therefore your algorithm must work only with the sensor data it receives. For example, it not valid to use any local geographic data of a region where the training routes were recorded. If you are not sure whether your particular approach violates this rule or not, you can get it clarified by sending an email to firstname.lastname@example.org. Do not post your approach at forums though as this is against the rules.
In order to receive the prize money, you will need to fully document the derivation of all parameters internal to your algorithm. If these parameters were obtained from the training data set, you will also need to provide the program used to generate these training parameters. There is no restriction on the programming language used to generate these training parameters. Note that all these data should NOT be submitted anywhere during the coding phase. Instead, if you win a prize, a TopCoder representative will contact you directly in order to collect these data. The description must be submitted within 7 days after the contest results are published.
We reserve the right not to release the source code of all solutions after the end of the match. We also reserve the right to restrict the discussion of possible solution approaches at forums after the end of the match.