- 1st Place - $10,000
- 2nd Place - $5,000
- 3rd Place - $2,500
- 4th Place - $1,500
- 5th Place - $1,000
Stunting (i.e., shortness for age) affects more than one in four children worldwide. Wasting (i.e., being underweight for age) and stunting in early childhood are associated with lethargy, reduced levels of play, an increased risk of early death, higher burden of disease, compromised physical capacities, and diminished cognitive development. Stunting and wasting in the first two years of life have been shown to be associated with lower school attainment and reduced economic productivity. This can reduce the productivity of an entire generation. Furthermore, stunting between 12 and 36 months has also been linked to poor cognitive performance and/or lower school grades in middle childhood, and both height and head circumference at 2 years have been shown to be inversely associated with educational attainment.
In this challenge, we explore the link between cognitive development and child stuntedness. While stuntedness is known to correlate with poor cognitive development, we are interested in finding out if this is reversible. Are there children who were born stunted but nevertheless are able to successfully overcome their slow cognitive development rates later in life? Furthermore, do children born stunted who overcome their small size (perhaps by adequate nutrition) also show increased cognitive ability? Are there external factors (either inherited from the parents or environmental) which can contribute positively to this recovery?
In pursuit of the above, we have collected time series measurements of child growth and family trait data (e.g., mother’s age, mother’s height, number of previous pregnancies, breast-feeding practices, and father’s height). You will need to use this data to predict a child’s IQ measured at 7 years of age in the attached dataset, which contains censored values. To test our hypotheses, we would like to predict IQ in 3 scenarios:
- Just using:
- Columns 1, 12, 14, 15, 16, 17, 18
- weight at birth
- length at birth
- gestational age at birth
- APGAR scores at birth
- sex at birth
- Everything in Scenario 1 above and:
- Columns 2, 3, 4, 5, 6, 7, 8, 9, 10
- weight measurements at various points in the child’s growth
- length (i.e., height) measurements at various points in the child’s growth
- combinations of weight and length at various points in the child’s growth (e.g., weight/length^2, etc.)
- Velocity changes in measures in weight and height (alternatively called growth tempo) between measurement periods
- Everything in Scenario 2 above and:
- Columns 11, 13, 19, 20, 21, 22, 23, 24, 25, 26
- Investigation site ID
- Characteristics of the parents
- The extent to which the child was breastfed
You may download the learning data set here.
Data Set Description
Col# Name Type Description/Notes
1 subjid int Subject ID (In ascending order, not all values necessarily exist)
2 agedays int Age of child in days (Day 1 = day of birth)
3 wtkg float Weight (kg)
4 htcm float Standing height (cm)
5 lencm int Recumbent length (cm)
6 bmi float Body Mass Index (kg/m2)
7 waz float Weight for age Z-score (Per WHO algorithm)
8 haz float Height of age Z-score (Per WHO algorithm)
9 whz float Weight for height Z-score (Per WHO algorithm)
10 baz float BMI for age Z-score (Per WHO algorithm)
11 siteid int Investigation site ID (several values in the range 5-82)
12 sexn int Sex of the child (1 = Male, 2 = Female)
13 feedingn int Breast feeding category
1 = Exclusively breast fed
2 = Exclusively formula fed
3 = Mixture breast/formula fed
90 = Unknown
14 gagebrth int Gestational age at birth in days
15 birthwt int Birth weight (grams)
16 birthlen int Birth length (cm)
17 apgar1 int APGAR score at 1 minute post birth
18 apgar5 int APGAR score at 5 minutes post birth
19 mage int Maternal age at birth of child (years)
20 demo1n int Maternal demographic variable 1 (Nominal value 1 or 2)
21 mmaritn int Mother’s marital status
1 = Married
2 = Common law
3 = Separated
4 = Divorced
5 = Widowed
6 = Single
22 mcignum int Mother's # of cigarettes per day during pregnancy
23 parity int Maternal parity (# of previous live births at the time of this child’s birth)
24 gravida int Maternal gravidity (# of pervious times pregnant)
25 meducyrs int Mother's education level (years)
26 demo2n int Maternal demographic variable 2 (Nomial value 1, 2, 3, 4 or 5)
27 geniq int IQ measured at age 7
* Variable to predict
Code and Scoring
Your code will be given String training and String testing, and will need to return a double, containing one value (the predicted IQ) for each ID present in the test data. The test data will be provided in order by ID, and your return values should be in that same order.
Your code will also be given ints testType and scenario, indicating the type of test and which of the three scenarios is being tested.
Your code will be scored by calculating the SSE (sum of squared error) of your predictions. Also, SSE0 will be calculated, using the average of all IQ values in the training set as the prediction. Your score for a test case will then be given by:
Score = 1000000 * MAX(0, 1 - SSE/SSE0)
Your overall score will be the average score across all test cases.
Notes on Data Set Generation
- The full data set contains approximately 138,000 lines, covering just over 30,000 ID values.
- The full data set is divided into 40% for example tests, 20% for provisional tests, and 40% for system tests. All data belonging to the same ID is placed in the same data set.
- For each test, approximately 66% of the data (from that segment) is selected for training, and the remainder for testing.
- For provisional tests, all example data is also added to the training set.
- For system tests, all example and approximately 50% of the provisional data is also added to the training set.
- There are 10 example cases and 50 provisional test cases.
Notes on Time Limits
Because different test types deal with different volumes of data, the time limits will also differ. Example tests are limited to 360s (6 minutes), provisional tests to 540s (9 minutes) and system tests to 900s (15 minutes). The testType parameter will be 0, 1, or 2, to indicate Example, Provisional, or System test, respectively, so that your code can take timing into account. Similarly, the scenario parameter is also 0, 1, or 2, referring to the three scenarios listed above.