1st place - $2,500
2nd place - $2,000
3rd place - $1,500
4th place - $1,000
The client is looking to run a contest in order to better understand the effect on market prices of traded securities based on trading volume data. Contestants will use supplied traded data to create an algorithm that will attempt to predict swap prices.
The 2010 Dodd–Frank Wall Street Reform and Consumer Protection Act (the Dodd-Frank Act) created new entities called swap data repositories (SDRs) “in order to provide a central facility for swap data reporting and collecting. Under the Dodd- Frank Act, all swaps, whether cleared or uncleared, are required to be reported to registered SDRs.” As of January 2013, all registered swap dealers active in credit and interest rate trading send trade data to the public swap repository. Depending on their size and type (e.g., block trades), swap transactions must be reported within 5 to 15 minutes of execution. These developments have increased the availability of swap trade data. An extract of this data for a specified time period is supplied for this challenge.
Supply and demand in the swap market affect swap prices. Swap prices are also influenced by tenor. Tenor is the maturity of the swap measured in full-years such as 2, 3, 5, 7, 10, and 30. We are interested in using the volume of vanilla US$ / Libor spot start swap transactions of the full-year maturities to predict the prices of those same instruments over relatively short time intervals.
The scoring will focus on the tenors of full years in PriceData. In SwapData, you may receive some irregular tenors such as *Y*M.
Public Swap Repository Data:
- Timestamp – The time stamp when the mid price is recorded.
- Tenor – The trade instrument.
- ABC mid – Mid price for trades from source ABC.
- DEF mid – Mid price for trades from source DEF.
Example data is provided in Jan and Feb 2016 (i.e., 2 months). Another 2 months will be used for provisional tests and system tests respectively.
- Time Stamp: The time when the trade happened.
- Price: The traded price at which level the transaction happens.
- Size: The size of this trade.
- Tenor: The trade instrument.
- Trade Direction: Whether someone buys or sells.
- ABC/DEF: Trades on ABC or DEF.
The evaluation will be a streaming mode. That is, predictions are made when you receiving some new data and your prediction will be compared to mid prices in a short period (e.g., 5 to 10 minutes) after the latest data your have. The data will be sent strictly in the chronological order.
Your task is to implement two methods: update and predict, whose signatures are detailed in the Definition section below. Both methods will be called several times.
In update, you will receive some new data with timestamps. More specifically, you will receive two lists of comma separated strings (quotes enclosed). The columns are in the same order as data description.
In predict, you should return a list of predictions in the same order of the received test data. The test data has the similar format as Price Data, but there are no “ABC mid” and “DEF mid”. Every prediction forms a string containing two values separated by a comma: the predicted ABC mid and DEF mid of the specified tenor at the specified time. For example, “1.002,2.000” (without quotes) could be a prediction.
Submissions will be scored by running the solution against different data from different time periods. Before the first call of predict, at least 2 hours of data will be given to make sure you have a reasonable volume of data to build up your model.
The generation of test case is as follows:
- Randomly select 2~3 consecutive days from the given time period. For instance, in example test, we will select 2~3 days from Jan and Feb 2016. Two days are consecutive if there is only holidays and weekends in between.
- Use the first 2 hours’ data for the first update.
- Call predict for next random 5 ~ 10 minutes.
- Call update using the data in next random 10~30 minutes. We will keep add additional 10~30 minutes until there is certain amount of data.
- Go to 3 if there is data left. Otherwise, this test case ends.
In every test case, the raw error is calculated as
rawErr = 0
for i = 1 to N do
rawErr += (ABCTruth[i] - ABCPred[i])^2 + (DEFTruth[i] - DEFPred[i])^2
where, N is the total number of predictions.
As a naive solution, we will use the average price of all seen data of the same tenor as the baseline. For example, to predict the ABC mid for 3Y, all 3Y ABC mid’s have been seen until now will be used to calculate an average as the ABCpred. If there is no such data ever seen before, we will predict it as 0. The raw error computed based on this method serves as our baseErr.
The raw score will be
raw score = max(0, 1 - rawErr / baseErr)
The final score of each test case will be the raw score multiplied by 1000000.0. And the score showing on the standing will be the average score of different test cases.
Requirements to Win A Prize
In order to receive a prize, you must do all the following:
- Achieve a score in the top 4, according to system test results. See the "Scoring" section below.
- Create a legitimate algorithm that runs successfully on a different data set with the same fields. Hard-coded solutions are unacceptable.
- Within 7 days from the end of the challenge, submit a complete report at least 2 pages long outlining your final algorithm, explaining the logic behind and steps to its approach. The required content and format appear in the "Report" section below.
- Within 7 days of the end of the challenge, submit all code used in your final algorithm in 1 appropriately named file (or tar or zip archive). We will contact the winners via email and ask for the file. The naming convention should be memberHandle-ContestName. For example, handle "johndoe" would name his submission "johndoe-ContestName."
Your report must be at least 2 pages long, contain at least the following sections, and use the section and bullet names below.
This section must contain at least the following:
- First name
- Last name
- Topcoder handle
- Email address
Please describe your algorithm so that we know what you did even before seeing your code. Use line references to refer to specific portions of your code.
This section must contain at least the following:
- Approaches considered
- Approach ultimately chosen
- Steps to approach ultimately chosen, including references to specific lines of your code
- Open source resources and tools used, including URLs and references to specific lines of your code
- Advantages and disadvantages of the approach chosen
- Comments on libraries
- Comments on open source resources used
- Special guidance given to algorithm based on training
- Potential improvements to your algorithm
If you place in the top 4 but fail to do any of the above, then you will not receive a prize, and it will be awarded to the contestant with the next best performance who did all of the above.
- Only data used in Example Test will be released, you can download it here.
- In order to receive the prize money, you will need to fully document your code and explain your algorithm. If any parameters were obtained from the training data set, you will also need to provide the program used to generate these parameters. There is no restriction on the programming language used to generate these training parameters. Note that all this documentation should not be submitted anywhere during the coding phase. Instead, if you win a prize, a TopCoder representative will contact you directly in order to collect this data.
- You may not use any external (outside of this competition) source of data to train your solution.