JOIN
Get Time
long_comps_topcoder  Problem Statement
Contest: Quake Predictor X
Problem: QuakePredictorX

Problem Statement

    

Prizes

The best 6 performers of this contest, having a positive score (according to system test results) will receive the following prizes:

  1. place: $4000
  2. place: $1500
  3. place: $750
  4. place: $250
  5. place: $250
  6. place: $250

The top 3 solutions of the first match is available here. Please feel free to use anything from their submissions.

Background

QuakeFinder (QF) has been involved for over 12 years with research looking at several electromagnetic indicators that may eventually provide short term (weeks to days) warning for large seismic events. These electromagnetic signals or indicators appear to be related to stress-induced release of charge conductors (p-hole carriers) deep in the earth, near future earthquake hypocenters. See this document. The indicators manifest themselves as unusual magnetic (unipolar) pulses, large increases in air conductivity near the future epicenter, "earthquake lights" (extreme cases of air ionization), and apparent infra red patterns seen near the epicenter as detected via various IR weather and environmental satellites. There may be also other manifestations that we are currently not aware of. You will be provided with several specific data sets (e.g. 3 axis induction magnetometer channels, and air conductivity sensor channel) from ground instruments near several earthquakes in California and Peru.

Problem Statement

Your task is to develop a software algorithm to uniquely identify the electromagnetic pulses that may precede an earthquake by days to weeks.

Each data set contains measurements from one site. The site provides 3 channels of information, measured at a frequency of 32 or 50 samples per second. Hourly data for the channels will be given to your algorithm. Your algorithm should return the odds ratio of an earthquake event happening for every coming hour at each site, for a period of 90 days.

As a real raw magnetometer data, it may measure many signals with the origins non-related to earthquakes. Many of those signals may have even stronger amplitude than the signal from the events themselves. Here are just a few examples of some of the known signals of that type: Vehicle engines, lightning, solar flares, electrical interference, magnetometer resets, ...

Implementation Details

You should implement the init method, which provides you with information about the sample rate. The init method will be called once for every test case. sampleRate contains the number of samples measured (H) in each second.

You should implement the forecast method. The method will be called once for each hour of data. hour contains the zero-based index of the hour to which the data belongs. The data array contains the measurements for all channels for the specific hour. The array will contain H*3600*3 elements. The ith measurement for channel c will be at data[c*(H*3600) + i]. The range of the values in the data array is in [0,2^24]. Sometimes it happens that measurements are not always available, if the data array contains a value of -1 it indicates that the measurement is not provided. K contains the planetary magnetic activity index at the given hour as is in the range [0,10].

You should return an array N of size 2160. Each value in the array should be the odds ratio of an earthquake event happening at time t. In other words, the value at return[t] should contain the odds ratio of an earthquake event at the site at hour h+t+1. (important - the output has changed as compared to the first match)

Each test case will have only one or zero earthquake events, for the event-positive test cases your forecast method will stop being called when an event occurred, for the event-negative test cases your forecast method will stop being called at a randomly selected time-moment before the dataset end.

Scoring

Your forecast will be scored from day 32 (hour 768) onwards, until the earthquake event occurred. Your returned array N will not be normalized. Let Z=9. Let G be the entry in the array N where the actual event happened at the specific hour. In case the test contains an earthquake event, your score (F) for a single hour will then be:

  F = 2 * G - (Sum of squared values in N / (Z*sizeof(N)) - 1  

In case an earthquake event is not present, let W[h] be a hidden weight for hour h. Your score for a single hour will then be:

  F = - (Sum of squared values in N / (Z*sizeof(N)) * W[h]  

Your raw score for a test case will then be the sum of all the F scores. Finally, your total score is equal to the sum of raw scores on all test cases. If your system test total score is negative, you will receive a zero system test total score. Your total score shown on the leaderboard will be divided by the total number of hours for which F is calculated on event-positive test cases. Finally, it will be multiplied by 1000000.

Data sets

The training data sets consist of 50% of all the sets we have available for the contests. The data can be accessed here

  • The 5 example test cases will execute your algorithm on these training data seeds: 1, 2, 3, 4 and 5. The training set consists of 424 test cases.
  • There is 132 provisional test cases and they are different from the training data.
  • There is 194 system test cases and they are different from the training and provisional data.

Tools

A visualization tool is provided for offline testing and can be downloaded here.

Special Notes

  • No normalization to 1 within the test-case in this match. The algorithm predictions on each separate test-case are not expected to sum up to the same value - the scoring rewards if the algorithm predictions sum up to a lower value one on the event-negative cases and to a higher value on event-positive ones.
  • The scoring system is balanced around the neutral baseline when all values in N on all test-cases are set to 1. Thus, we recommend to submit 1 when there are no factors available allowing to increase or decrease the odds for the earthquake to happen at given hour near given site. A return of 1 for all values in N corresponds to a zero cumulative score.
  • The scoring system is balanced around the overall value MeanAcrossAllTestcases[N] to be of the order of 1. The scoring system punishes if N return is continuously too large or too small.
  • The hidden weight for hour h, W[h], keeps the weight proportion of event-positive and event-negative test cases independent of the data hour your algorithm is called at. The score is balanced that the weight of event-negative cases is Z-1=8 times the weight of the positive ones regardless of the hour.

Special Rules

  • You are not allowed to use the global quake information from the first match. Specifically, the data in the Quakes.xml and SiteInfo.xml files.
  • We are very excited to provide you with the opportunity to work on a real, not artificial, earthquake data. Given that, we want to specifically clarify, that it is not allowed for your solution to use external datasources or exploit any type of test-cases overlaps in order to localize the timing of the event or make preferences towards it's location. Your solution must presume that the chance of the event is equal for all the hours and all the sites of the test-case, and modify those probabilities based on the analysis of the signal from the given test-case only. The solutions will be screened and tested after the contest, and the ones, not compliant with this request, will get disqualified. Please, don't hesitate to send us a request if you think to apply a technique that has a chance to be close to borderline. We want this to be an exciting scientific contest, so help us to make it fair!

 

Definition

    
Class:QuakePredictorX
Method:forecast
Parameters:int, int[], double
Returns:double[]
Method signature:double[] forecast(int hour, int[] data, double K)
 
Method:init
Parameters:int
Returns:int
Method signature:int init(int sampleRate)
(be sure your methods are public)
    
 

Notes

-The time limit is 60 minutes per test case (this includes only the time spent in your code).
-The memory limit is 4096 megabytes.
-There is no explicit code size limit. The implicit source code size limit is around 1 MB (it is not advisable to submit codes of size close to that or larger). Once your code is compiled, the binary size should not exceed 1 MB.
-The compilation time limit is 30 seconds. You can find information about compilers that we use and compilation options here.
 

Examples

0)
    
Seed = 1
1)
    
Seed = 2
2)
    
Seed = 3
3)
    
Seed = 4
4)
    
Seed = 5

This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2010, TopCoder, Inc. All rights reserved.