The best 5 performers of this contest (according to system test results) will receive the following prizes:
- place: $10000
- place: $6000
- place: $4000
- place: $3000
- place: $2000
QuakeFinder (QF) has been involved for over 12 years with research looking at several electromagnetic indicators that may eventually provide short term (weeks to days) warning for large seismic events. These electromagnetic signals or indicators appear to be related to stress-induced release of charge conductors (p-hole carriers) deep in the earth, near future earthquake hypocenters. See this document. The indicators manifest themselves as unusual magnetic (unipolar) pulses, large increases in air conductivity near the future epicenter, "earthquake lights" (extreme cases of air ionization), and apparent infra red patterns seen near the epicenter as detected via various IR weather and environmental satellites. There may be also other manifestations that we are currently not aware of.
You will be provided with several specific data sets (e.g. 3 axis induction magnetometer channels, and air conductivity sensor channel) from ground instruments near several earthquakes in California and Peru.
Your task is to develop a software algorithm to uniquely identify the electromagnetic pulses that may precede an earthquake by days to weeks.
Each data set contains measurements from 5 to 9 different sites. Each site provides 3 channels of information, measured at a frequency of 32 or 50 samples per second. Hourly data for all the sites and channels will be given to your algorithm. Your algorithm should return the probability of an earthquake event happening for every coming hour at each site, for a period of 90 days.
The earthquake signal propagation speed is of the order of 4km/second, but here is a more precise formula for the earthquake transit time. The distance in km between two locations can be calculated with the formula given here.
As a real raw magnetometer data, it may measure many signals with the origins non-related to earthquakes. Many of those signals may have even stronger amplitude than the signal from the events themselves. Here are just a few examples of some of the known signals of that type: Vehicle engines, lightning, solar flares, electrical interference, magnetometer resets, ...
You should implement the init method, which provides you with information about the sample rate, number of sites and their locations. The init method will be called once for every test case. sampleRate contains the number of samples measured (H) in each second. numOfSites contains the number of sites (S). The sitesData array contains the location for each site. The latitude of the ith site is available in sitesData[i*2] and the longitude in sitesData[i*2+1]. You can return any integer, the value will be ignored.
You should implement the forecast method. The method will be called once for each hour of data. hour contains the zero-based index of the hour to which the data belongs. The data array contains the measurements for all sites and all their channels for the specific hour. The array will contain H*3600*S*3 elements. The ith measurement for channel c from site j will be at data[j*(H*3600*3) + c*(H*3600) + i]. The range of the values in the data array is in [0,2^24]. Sometimes it happens that measurements are not always available, if the data array contains a value of -1 it indicates that the measurement is not provided. K contains the planetary magnetic activity index at the given hour as is in the range [0,10]. globalQuakes contains information about earthquakes that happened at other locations during the hour. The latitude of the ith quake is available in globalQuakes[i*5] and the longitude in globalQuakes[i*5+1]. The depth of the quake in globalQuakes[i*5+2] and the magnitude in globalQuakes[i*5+3]. globalQuakes[i*5+4] contains the time in seconds from the start of the test case when the earthquake happened.
You should return a matrix N of size S * 2160. Each value in the matrix should be the probability of an earthquake event happening at site j at time t. In other words, the value at return[t*S + j] should contain the probability of an earthquake event at site j at hour t.
Each test case will have only one earthquake event, your forecast method will stop being called when the event occurred.
Your forecast will be scored from day 32 (hour 768) onwards, until the earthquake event occurred.
Your returned matrix N will be normalized in the interval from hour max(hour, 768) to 2160. Let NN be the normalized sub-matrix of N. Let G be the entry in the matrix NN where the actual event happened at the specific hour and at the closest site to the epicenter. Your score (F) for a single hour will then be:
F = sizeof(NN) * ( 2 * G - Sum of squared values in NN) - 1
The scoring system is adjusted in the way that your algorithm receives a zero score on a test if it constantly returns equal probabilities to all hourly cells. Also, the scoring formula is build in a way that adding a random noise to the output reduces the expectation of the score.
Your raw score for a test case will then be the sum of all the F scores. Finally, your total score is equal to the sum of raw scores on all test cases. If your total score is negative, you will receive a zero total score. Your total score shown on the leaderboard will be divided by the number of hours for which F is calculated on all test cases. Finally, it will be multiplied by 1000000.
The training data sets consist of 50% of all the sets we have available for the contests. The data can be accessed here
- The 4 example test cases will execute your algorithm on these training data seeds: 4, 6, 7 and 8. The training set consists of 75 test cases.
- The 30 provisional test cases will use 20% of the data and is different from the training data.
- The 45 system test cases will use 30% of the remaining data and is also different from the training and provisional data.
A visualization tool is provided for offline testing and can be downloaded here.
We are very excited to provide you with the opportunity to work on a real, not artificial, earthquake data. Given that, we want to specifically clarify, that it is not allowed for your solution to use external datasources or exploit any type of test-cases overlaps in order to localize the timing of the event or make preferences towards it's location. Your solution must presume that the chance of the event is equal for all the hours and all the sites of the test-case, and modify those probabilities based on the analysis of the signal from the given test-case only. The solutions will be screened and tested after the contest, and the ones, not compliant with this request, will get disqualified. Please, don't hesitate to send us a request if you think to apply a technique that has a chance to be close to borderline. We want this to be an exciting scientific contest, so help us to make it fair!