JOIN
Get Time
long_comps_topcoder  Problem Statement
Contest: Quake Predictor
Problem: QuakePredictor

Problem Statement

    

Prizes

The best 5 performers of this contest (according to system test results) will receive the following prizes:

  1. place: $10000
  2. place: $6000
  3. place: $4000
  4. place: $3000
  5. place: $2000

Background

QuakeFinder (QF) has been involved for over 12 years with research looking at several electromagnetic indicators that may eventually provide short term (weeks to days) warning for large seismic events. These electromagnetic signals or indicators appear to be related to stress-induced release of charge conductors (p-hole carriers) deep in the earth, near future earthquake hypocenters. See this document. The indicators manifest themselves as unusual magnetic (unipolar) pulses, large increases in air conductivity near the future epicenter, "earthquake lights" (extreme cases of air ionization), and apparent infra red patterns seen near the epicenter as detected via various IR weather and environmental satellites. There may be also other manifestations that we are currently not aware of. You will be provided with several specific data sets (e.g. 3 axis induction magnetometer channels, and air conductivity sensor channel) from ground instruments near several earthquakes in California and Peru.

Problem Statement

Your task is to develop a software algorithm to uniquely identify the electromagnetic pulses that may precede an earthquake by days to weeks.

Each data set contains measurements from 5 to 9 different sites. Each site provides 3 channels of information, measured at a frequency of 32 or 50 samples per second. Hourly data for all the sites and channels will be given to your algorithm. Your algorithm should return the probability of an earthquake event happening for every coming hour at each site, for a period of 90 days.

The earthquake signal propagation speed is of the order of 4km/second, but here is a more precise formula for the earthquake transit time. The distance in km between two locations can be calculated with the formula given here.

As a real raw magnetometer data, it may measure many signals with the origins non-related to earthquakes. Many of those signals may have even stronger amplitude than the signal from the events themselves. Here are just a few examples of some of the known signals of that type: Vehicle engines, lightning, solar flares, electrical interference, magnetometer resets, ...

Implementation Details

You should implement the init method, which provides you with information about the sample rate, number of sites and their locations. The init method will be called once for every test case. sampleRate contains the number of samples measured (H) in each second. numOfSites contains the number of sites (S). The sitesData array contains the location for each site. The latitude of the ith site is available in sitesData[i*2] and the longitude in sitesData[i*2+1]. You can return any integer, the value will be ignored.

You should implement the forecast method. The method will be called once for each hour of data. hour contains the zero-based index of the hour to which the data belongs. The data array contains the measurements for all sites and all their channels for the specific hour. The array will contain H*3600*S*3 elements. The ith measurement for channel c from site j will be at data[j*(H*3600*3) + c*(H*3600) + i]. The range of the values in the data array is in [0,2^24]. Sometimes it happens that measurements are not always available, if the data array contains a value of -1 it indicates that the measurement is not provided. K contains the planetary magnetic activity index at the given hour as is in the range [0,10]. globalQuakes contains information about earthquakes that happened at other locations during the hour. The latitude of the ith quake is available in globalQuakes[i*5] and the longitude in globalQuakes[i*5+1]. The depth of the quake in globalQuakes[i*5+2] and the magnitude in globalQuakes[i*5+3]. globalQuakes[i*5+4] contains the time in seconds from the start of the test case when the earthquake happened.

You should return a matrix N of size S * 2160. Each value in the matrix should be the probability of an earthquake event happening at site j at time t. In other words, the value at return[t*S + j] should contain the probability of an earthquake event at site j at hour t.

Each test case will have only one earthquake event, your forecast method will stop being called when the event occurred.

Scoring

Your forecast will be scored from day 32 (hour 768) onwards, until the earthquake event occurred. Your returned matrix N will be normalized in the interval from hour max(hour, 768) to 2160. Let NN be the normalized sub-matrix of N. Let G be the entry in the matrix NN where the actual event happened at the specific hour and at the closest site to the epicenter. Your score (F) for a single hour will then be:

  F = sizeof(NN) * ( 2 * G - Sum of squared values in NN) - 1  

The scoring system is adjusted in the way that your algorithm receives a zero score on a test if it constantly returns equal probabilities to all hourly cells. Also, the scoring formula is build in a way that adding a random noise to the output reduces the expectation of the score.

Your raw score for a test case will then be the sum of all the F scores. Finally, your total score is equal to the sum of raw scores on all test cases. If your total score is negative, you will receive a zero total score. Your total score shown on the leaderboard will be divided by the number of hours for which F is calculated on all test cases. Finally, it will be multiplied by 1000000.

Data sets

The training data sets consist of 50% of all the sets we have available for the contests. The data can be accessed here

  • The 4 example test cases will execute your algorithm on these training data seeds: 4, 6, 7 and 8. The training set consists of 75 test cases.
  • The 30 provisional test cases will use 20% of the data and is different from the training data.
  • The 45 system test cases will use 30% of the remaining data and is also different from the training and provisional data.

Tools

A visualization tool is provided for offline testing and can be downloaded here.

Special Rules

We are very excited to provide you with the opportunity to work on a real, not artificial, earthquake data. Given that, we want to specifically clarify, that it is not allowed for your solution to use external datasources or exploit any type of test-cases overlaps in order to localize the timing of the event or make preferences towards it's location. Your solution must presume that the chance of the event is equal for all the hours and all the sites of the test-case, and modify those probabilities based on the analysis of the signal from the given test-case only. The solutions will be screened and tested after the contest, and the ones, not compliant with this request, will get disqualified. Please, don't hesitate to send us a request if you think to apply a technique that has a chance to be close to borderline. We want this to be an exciting scientific contest, so help us to make it fair!

 

Definition

    
Class:QuakePredictor
Method:init
Parameters:int, int, double[]
Returns:int
Method signature:int init(int sampleRate, int numOfSites, double[] sitesData)
 
Method:forecast
Parameters:int, int[], double, double[]
Returns:double[]
Method signature:double[] forecast(int hour, int[] data, double K, double[] globalQuakes)
(be sure your methods are public)
    
 

Notes

-The time limit is 60 minutes per test case (this includes only the time spent in your code).
-The memory limit is 4096 megabytes.
-There is no explicit code size limit. The implicit source code size limit is around 1 MB (it is not advisable to submit codes of size close to that or larger). Once your code is compiled, the binary size should not exceed 1 MB.
-The compilation time limit is 30 seconds. You can find information about compilers that we use and compilation options here.
 

Examples

0)
    
Seed = 4
1)
    
Seed = 6
2)
    
Seed = 7
3)
    
Seed = 8

This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2010, TopCoder, Inc. All rights reserved.