 Yaw Alignment Marathon Challenge
Prize Distribution
1st place  $17,000
2nd place  $7,000
3rd place  $4,000
4th place  $2,000
5th place  $1,000
Introduction
We are looking for solutions that can compute yaw misalignment angle of wind turbines from analysis of Supervisory Control and Data Acquisition (SCADA) data. The submitted solutions will be validated using yaw misalignment values obtained from another data source. We hope to receive wonderful solutions. Good Luck!
Important Notes: in order to avoid overfitting the provisional test, we will only provide the leaderboard after Nov 12, 2018 EDT.
Requirements to Win a Prize
In order to receive a prize, you must do the following:
 Achieve a score in the top 6, according to system test results. The score must be higher than the baseline result (i.e., 200,000). See the "Data Description and Scoring" section below.
 We will run an additional round of system test. We refer it as a VMsys test. Top6 contestants according to the system test will be asked to submit their codes and a stepbystep deployment guide within 48 hours. We will evaluate them in an AWS EC2 instance. (p2.xlarge if GPU is required, otherwise c5.2xlarge). The main purpose is to avoid lookahead bias by utilizing any future data.
 The code must be in Python or R.
 You should make sure the licenses of all dependent libraries are commercially friendly.
 You must provide a bash script, i.e., run.sh, which takes two arguments as input: (1) a folder name that contains the SCADA data. There will be a couple csv files under that folder, containing all data before a certain date. The csv filenames are of different time periods and named after the starting dates (YYYYMMDD). Currently, they are divided at a yearly basis. If you would like to further divide them into a finegrain level, (e.g., 3 months or 6 months), please state this in your submission. We will try to accommodate your requests (or you can put some script to generate them). (2) the output txt filename of the prediction. It has 3*N lines, where N is the number of turbines. For every turbine, the first line is the turbine name; the second line must be next week's single value prediction; and the third line contains 1008 values separated by spaces representing the 10min values for the following week. The turbines must be sorted in the alphabetical order.
 The whole script must be finished within 1 hour.
 The contest winners are determined based on the VMsys test's results, i.e., a reranking of the top 6.
 In the VMsys test results if you achieve score more than 200,000 but within 600,000, you will receive 30% of the corresponding prize money.
 If you achieve 600,000 or more, you will receive full prize money according to your position.
 Within 7 days from the announcement of the contest winners, submit a complete 2page (minimum) report that: (1) outlines your final algorithm and (2) explains the logic behind and steps of your approach. Additionally, you should bundle your code and write a deployment guide so that we can run it easily on the Azure databricks platforms. The output file should follow the format described in the output file section below.
If you place in the top 6, but fail to do all of the above, then you will not receive a prize, which will be awarded to the contestant with the next best performance who completed the submission requirements above.
Note that your submission will be disqualified if you use validation data to train your model.
Background
The business objective is to optimize energy generation at each WTG (Wind Turbine Generator) by dynamic calculation of yaw misalignment angle. We seek a predictive model to calculate yaw misalignment angles in 10 minute intervals for the following 7 days using only historical SCADA, as well as to provide an optimal yaw misalignment correction value for the 7day period.
In wind turbines, correct nacelle alignment to main wind direction is necessary for optimal power generation and thus maximize the annual energy production. Much evidence and analysis exists to suggest that a 4 degree deviation of the nacelle with respect to the true wind direction would result in an AEP (Annual Energy Production) loss of 1%. Ideally, the yaw misalignment angle should be 0 degrees, but we will allow solutions that produce correction values within 2 degrees of the true wind direction.
A power curve derived from an operating wind turbine describes the relationship between its output power and different wind speeds at hub height. Power curves help in energy assessment and performance monitoring of wind turbines. I.e. they describe the relationship between wind speed and actual power generated. If the wind vector is perpendicular to the rotor area, the turbine performs optimally, but large inflow angles due to yaw misalignment compared to the plane of the rotor lead to lower performance. The figure below depicts a typical measured power curve, before and after yaw misalignment correction.
The improvement in generated power through yaw correction based on the dynamic yaw misalignment values given by your submitted algorithm/model will be measured using such power curves.
Key Data challenge: We have SCADA data that covers a period of 5+ years but the other data source only characterizes a few turbines for a much shorter time period. Hence we would like to use SCADA data to build and train models that can be used to predict the yaw misalignment for any other turbine independently.
Related Topcoder Challenge: Previously, we had launched a related ideation challenge. The winning solution provided an accurate model that used a small subset of data, but still relied on other data sources. You can find that submission in an attachment after you register for the Marathon challenge.
Objective
In this Marathon challenge, your submitted algorithm/model would need to produce yaw misalignment value predictions in 10 minute intervals for the following 7 days, i.e. it will generate a total of 1008 values. Your system should also generate a single optimum value from these 1008 values for next 7 days. Specifically, at a specific time T, you are given SCADA data before time T + 7 days for training. You need to estimate the yaw misalignment values for the time period of [T, T + 7 days] at 10 minute intervals, i.e., [T, T + 10 mins], [T + 10 mins, T + 20 mins], … [T + (10 mins * 1007) ; T + 7 days]. Then, you will need to aggregate these 1008 values into a single value.
At WTG Unit level
Data Description
SCADA data of all turbines is being provided to the community. We will provide a few data points from the other data source that we mentioned above so that you can perform your own validation.
The URL for downloading the data will be provided in the discussion forum.
There are multiple columns in both types of data. We also provide codebooks in spreadsheet form that explain the meaning of the column names. The YMA (in degrees) column in the other data is the yaw misalignment you are going to estimate.
Important Domain Knowledge on Ground Truth
Below are a few possible approaches to identify target variable. Please note that this is one of the many approaches which can be used to solve the Business problem. The contestants are at complete liberty to use any SCADA tag as a target variable as long as doing so results in an algorithm/model which produces the most accurate result.

In the absence of Target Variable (Yaw Misalignment) in SCADA data we can possibly assume that the top 10% of the available records sorted in descending order of Active Power for each Wind speed bin have minimum yaw misalignment. We can take the RWD (Relative Wind Direction) for these records as the ground truth (i.e. the yaw misalignment)
Detailed analysis on this is shared as a separate PDF document. (Pdf file) and we tested this using sample data
RWD in SCADA is the Relative Wind Direction w.r.t. Nacelle Position.
P_{max} explanation
 θ_{E} is the Total Yaw error and not Yaw misalignment (denoted by angle A in the diagram). The difference between two is as follows:
 θ_{E} (Total Yaw error) = Angle b/w Nacelle position and True wind direction.
 A (Yaw Offset) = Angle b/w True Wind Direction and Relative Wind Direction
P → P_{max} only when θ_{E}→0;
By definition θ_{E} is the angle b/w NP and True Wind Direction (TWD) and when θ_{E} → 0 NP = TWD; A is caused by wind vane error so it won't be 0 in figure 2 (rt side)
 Total Yaw misalignment gets partially corrected with OEMs auto Yawing mechanism. The remaining error is due to the inaccuracy of measurement of Wind direction by Wind vane and is termed as Yaw Offset which we need to correct
 No Data point in SCADA gives the wind wane error (Yaw offset) or True wind direction that’s why as a probable solution we thought of following approach:
 When the total θ_{E} (Total Yaw error) → 0 the inaccuracy of Wind vane measurement is still there which needs to captured
 When the total θ_{E} (Total Yaw error) → 0 , Nacelle Position = True Wind direction
 We can capture Relative wind direction (RWD) for these records and train the model using this RWD as Target variable
 In fig. 2 Total Yaw error is 0 however Yaw Offset is not 0
 A similar approach using SCADA P_{max} value was brought to our attention through the ideation challenge (see https://www.researchgate.net/publication/323592878_DataDriven_Method_for_Wind_Turbine_Yaw_Angle_Sensor_ZeroPoint_Shifting_Fault_Detection), however we have not created and tested the ground truth using that ideation approach, as this approach encourages use of different bin sizes of Wind Speed and Yaw angle to arrive at the best Power generated bucket.
You are free to take either approach using SCADA as Target variable, provided it meets the validation and success criteria.
Wake effect: Lat and Long details are given separately for your reference, which can be used to understand the distance between the turbines.
Model Success Criteria
 We will evaluate your model using SCADA data only. The input will be SCADA data ONLY and the output will be compared against validation data.
 The predictions produced by your system need to be within 0 to 2 degrees of variance either side compared to our weekly validation data
 With wake effect consideration
Implementation
In your submission, you are asked to submit a link to a CSV file containing the weekly single value estimation for the all Turbines in the following time periods
 19/05/2018 to 25/05/2018
 26/05/2018 to 01/06/2018
 02/06/2018 to 08/06/2018
 09/06/2018 to 15/06/2018
 24/08/2018 to 30/08/2018
 28/08/2018 to 03/09/2018
The CSV file's header should be "Turbine", "Date Range", and "Weekly Estimation YAW Error". For example,
Turbine,Date Range,Weekly Estimation YAW Error
B01,19/05/2018 to 25/05/2018,1.23
B01,26/05/2018 to 01/06/2018,3.21
...
In the provisional test, we will evaluate the first 2 time periods; In the system test, we will evaluate the rest 4 time periods. In the VMsys test, we will reevaluate the rest 4 time periods.
The link must be downloadable. One example is to use Dropbox. Once you copied the Dropbox link, you need to further modify the "dl=0" to "dl=1" at the end of the link.
Scoring
For each test case, we will call predict exactly once. Based on the turbines where we have the ground truth labels, we calculate the Mean Square Error (MSE). For all test cases, we first calculate the average of the MSEs, and then take a square root of it. The square root is denoted as RMSE.
The final score will be max(0, (5  RMSE) / 5 * 1,000,000). That is, when RMSE is greater than 5, you will receive a zero final score.
[Required for Winning Solution only] Output File
The details can be found at here.
