JOIN
Get Time
long_comps_topcoder  Problem Statement
Contest: Morgoth's Crown
Problem: SpectrumPredictor

Problem Statement

    

Prize Distribution

              Prize             USD
  1st                        $20,000
  2nd                        $15,000
  3rd                        $10,000
  4th                         $2,000
  5th                         $1,000
  Progress prizes*
                   week 2     $1,000
                   week 4       $500
                   week 6       $500
Total Prizes                 $50,000
*see the 'Requirements to Win a Prize' section for details

Background and motivation

The MORGOTH'S CROWN Challenge (Modeling of Reflectance Given Only Transmission or High-concentration Spectra for Chemical Recognition Over Widely-varying eNvironments) is offered by IARPA (Intelligence Advanced Research Projects Activity), within the Office of the Director of National Intelligence (ODNI). IR spectrometers measure the signature of an unknown compound on a given surface, and a detection algorithm identifies the compound by comparing to a detection library. Even with a perfect spectral measurement, the identification is only as good as the correspondence between the detection library and the "real world" signatures. Currently the quality of these detection libraries, and the computational models that support them, are a bigger limitation to accurate chemical identification than the capabilities of the spectrometer hardware. This Challenge invites experts from across government, academia, industry and developer communities to create fast and accurate IR spectral models using new approaches that will advance technology and potentially foster enormous humanitarian impact. IARPA will provide solvers with spectra of training coupons and bulk target chemical data, and solvers are asked to generate an algorithm to predict the sample coupon spectra. See more background information about the challenge here.

Objective

Your task will be to predict the infrared spectra of different chemicals, taking into account the effects of target chemical loading (mass, fill factor, film thickness), target chemical microstructure, and target chemical - substrate interaction. The spectra your algorithm returns will be compared to ground truth data, the quality of your solution will be judged by how much your solution correlates with the expected results, see Scoring for details.

Input Files

Coupon spectra

In this task you will work with infrared spectra of sample coupons. A coupon consists of two components: a substrate and a target chemical. In this challenge the substrate can be one of the following materials: glass, polished aluminum, roughened aluminum, anodized aluminum and acrylic. The target chemical is one of the following: acetaminophen, caffeine, potassium nitrate (KNO3) and warfarin. Data corresponding to a spectrum is made up of two components:

  • (a) a text file that holds the measured spectrum data and
  • (b) a text file that contains additional meta data about the measurement like the thickness (load) of the target material on the substrate, the method used when preparing the coupons, and many more.

Both (a) and (b) are provided as training data for 2 of the 4 target chemicals (warfarin and potassium nitrate), only (b) is provided for the other 2 chemicals. Data corresponding to acetaminophen is used for provisional testing, caffeine is used for final testing.

The format of (a) is the following:

  • The name of the file is <target>_<substrate>_<load>_<method>_HRF.txt where
    • <target> is the 3-letter abbreviation of the target chemical: one of {Ace, Caf, Pot, War}.
    • <substrate> denotes the substrate material, it can be one of {Glass, PA, RA, AA, acrylic} for glass, polished aluminum, roughened aluminum, anodized aluminum and acrylic, respectively.
    • <load> is the amount of target chemical spread on the substrate, measured in micrograms per square cm.
    • <method> is the coupon preparation method, one of {Ab, Sv} for airbrush and sieve, respectively
    • The <target>_<substrate>_<load>_<method> part of the filename will be referred to as coupon ID. An example coupon ID is War_AA_10.9_Ab.
  • The file contains 7152 {wavenumber, reflectance} pairs, one pair per line, the elements are separated by a single TAB character. Values are listed in decreasing order of wavenumber.

The format of (b) is the following:

  • The name of the file is <coupon_id>_meta.txt where <coupon_id> is the same as defined above for spectrum data files. An example meta data file is War_AA_10.9_Ab_meta.txt.
  • The file contains several key-value pairs that describe the physical parameters of the coupon and the parameters of the measurement environment. The most important pieces of meta data are already present in the file name but there are some others in this file that you may find useful for modeling, most notably:
    • Particle size. In case of the 'sieve' coupon preparation method the grit of the sieve mesh is an important, controlled parameter of the process. In case of 'airbrushing' this parameter doesn't apply.
    • Surface Roughness, Film Thickness and Fill factor. See the reference document 'Preparation Methodology for Airbrushed Samples.pdf' for details on these parameters.

Additionally to (a) and (b) there are other coupon spectra related files that may contain useful information:

  • <coupon_id>_DRF.txt files contain diffuse spectra data in a format identical to the _HRF.txt files (hemispherical spectra). See the reference document 'Spectral Measurement Procedure.pdf' for a discussion on hemispherical vs diffuse spectra.
  • <coupon_id>.pdf files contain physical properties and measurement parameters, images of the coupons and of the measurement equipment. Most of the textual information present in these files are also available in the <coupon_id>_meta.txt files.
  • <coupon_id>.pptx files contain coupon images, data on coupon surface roughness and fill factor, also plots of the HRF and DRF spectra.

Target chemical and substrate spectra

In addition to coupon spectra (ie. combinations of target chemicals and substrates) you have access to spectra of the pure target chemicals and substrates.

  • The 'Bare Substrates' folder of the downloadable training.zip file contains pure spectra information of the 5 substrates in subfolders named None_<substrate>_Bare_NA, where <substrate> is one of {Glass, PA, RA, AA, acrylic} for glass, polished aluminum, roughened aluminum, anodized aluminum and acrylic, respectively. The format of the files within these folders are the same as defined above for coupons.
  • The 'Bulk Reflectance' folder contains pure spectra information of the 4 target chemicals in subfolders named <target>_None_Bulk_NA, where <target> is the 3-letter abbreviation of the target chemical: one of {Ace, Caf, Pot, War}. The format of the files within these folders are the same as defined above for coupons, with the addition of a <target>_None_Bulk_NA_PSD.txt file which gives the particle size distribution.
  • The 'Pellet Data' folder contains pure spectra information of the 4 target chemicals obtained in a different way than in the case of 'Bulk Reflectance'. Bulk reflectance data is measured by bouncing an IR beam off of a bulk sample of the target chemical (literally a pile of the chemical powder) and measuring the spectrum of the reflected light. Pellet absorbance data is measured by mixing a small amount of the chemical with potassium bromide powder (which is transparent in the infrared), pressing the mixture into a pellet, passing infrared light through the pellet and measuring the spectrum of the transmitted light. Pellet data is contained in subfolders named <target>_KBr_Bulk_Pellet. Note that pellet data spectra has different spectral range and resolution than other spectra in this contest. Additionally these subfolders contain <target>_nk_estimate.csv files that contains the Kramers-Kronig transform of the spectra that may be useful for modeling.

Downloads

The following data files are available for download.

  • training.zip (80 MB). Contains all spectra and meta data for warfarin and potassium nitrate coupons, also pure target chemical and substrate spectra.
  • testing.zip. Contains meta data for acetaminophen coupons.

Output Files

Your output must be a text file that describes the spectra your algorithm predicts for all of the 18 coupons in a test set. The file should contain lines formatted like:

<coupon_id>,r1,r2,...,r7152

where

  • <coupon_id> is the (case sensitive) unique identifier of a coupon as defined in the Input Files section above. (Angle brackets are for clarity only, they should not be present in the file.)
  • r1,...,r7152 are your predicted reflectance values for the given coupon spectrum. These values must be listed in the same order as in the provided coupon spectra files, that is in decreasing order of wavenumbers.
  • The file must contain exactly 18 lines, one for each <coupon_id>. Each line must contain exactly 7153 values (a string and 7152 real numbers), separated by commas.

A sample line:

Ace_AA_205.6_Ab,0.28,0.27,0.25,...<truncated for brevity>...,0.98,0.99

Your output must be a single file with .txt extension. Optionally the file may be zipped, in which case it must have .zip extension.

Your output must only contain algorithmically generated spectra predictions. It is strictly forbidden to include manually created predictions, or spectra that - although initially machine generated - are modified in any way by a human.

Functions

This match uses the result submission style, i.e. you will run your solution locally using the provided files as input, and produce a CSV or ZIP file that contains your answer.

In order for your solution to be evaluated by Topcoder's marathon system, you must implement a class named SpectrumPredictor, which implements a single function: getAnswerURL(). Your function will return a String corresponding to the URL of your submission file. You may upload your files to a cloud hosting service such as Dropbox or Google Drive, which can provide a direct link to the file.

To create a direct sharing link in Dropbox, right click on the uploaded file and select share. You should be able to copy a link to this specific file which ends with the tag "?dl=0". This URL will point directly to your file if you change this tag to "?dl=1". You can then use this link in your getAnswerURL() function.

If you use Google Drive to share the link, then please use the following format: "https://drive.google.com/uc?export=download&id=" + id

Note that Google has a file size limit of 25MB and can't provide direct links to files larger than this. (For larger files the link opens a warning message saying that automatic virus checking of the file is not done.)

You can use any other way to share your result file, but make sure the link you provide opens the filestream directly, and is available for anyone with the link (not only the file owner), to allow the automated tester to download and evaluate it.

An example of the code you have to submit, using Java:

public class SpectrumPredictor  {
  public String getAnswerURL() {
    //Replace the returned String with your submission file's URL
    return "https://drive.google.com/uc?export=download&id=XYZ";
  }
}

Keep in mind that your complete code that generates these results will be verified at the end of the contest if you achieve a score in the top 5, as described later in the "Requirements to Win a Prize" section, i.e. participants will be required to provide fully automated executable software to allow for independent verification of the performance of your algorithm and the quality of the output data.

Scoring

A full submission will be processed by the Topcoder Marathon test system, which will download, validate and evaluate your submission file.

Any malformed or inaccessible file, or one that doesn't contain the expected number of lines will receive a zero score.

If your submission is valid, your solution will be scored using the following algorithm.

dist(s1, s2) is a function that measures the distance of two spectra, its definition is based on the SID(TAN) metric described in this paper.

Let s1 and s2 be two spectra, i.e. two arrays of real values, having equal length (n = 7152 in this contest). Then dist(s1, s2) is calculated as follows:

EPS = 1e-9;
sum1 = 0; sum2 = 0;
len1 = 0; len2 = 0;
for (i = 0; i < n; i++) {
    s1[i] = max(EPS, s1[i]);
    sum1 += s1[i];
    len1 += s1[i] * s1[i];
    s2[i] = max(EPS, s2[i]);
    sum2 += s2[i];
    len2 += s2[i] * s2[i];
}
len1 = sqrt(len1);
len2 = sqrt(len2);

sid = 0;
for (int i = 0; i < n; i++) {
    a = s1[i] / sum1;
    b = s2[i] / sum2;
    sid += (a - b) * (log(a) - log(b));
}

sum = 0;
for (i = 0; i < n; i++) {
    sum += s1[i] * s2[i];
}
cosAngle = sum / (len1 * len2);
sam = acos(cosAngle);

dist = sid * tan(sam);
        

Here the tan() and acos() functions are the usual trigonometric functions, log() is natural logarithm, sqrt() is square root. Note that a distance of 0 means that the two spectra are identical.

For each coupon in the test set a score will be calculated as

score(sPredicted) = max(0, 1 - (dist(sPredicted, sTruth) / (2 * dist(sSubstrate, sTruth)))),

where

  • 'sPredicted' is the spectrum your submission contains,
  • 'sTruth' is the expected ground truth spectrum,
  • 'sSubstrate' is the pure spectrum of the substrate that is used in the given coupon.

This means that coupon scores are normalized to the pure substrate spectra, so that a baseline solution that simply returns the substrate spectrum for a given substrate+chemical combination will get a score of 0.5.

Finally, your score will be the average of coupon scores calculated as above, multiplied by 1 000 000.

Note that you may make full submissions once every 8 hours.

Example submissions can be used to verify that your chosen approach to upload submissions works and also that your implementation of the scoring logic is correct. The tester will verify that the returned String contains a valid URL, its content is accessible, i.e. the tester is able to download the file from the returned URL. If your file is valid, it will be evaluated, and detailed score values will be available in the test results. The example evaluation is based on a small subset of the training data, these 3 sample coupons are used:

  • Pot_Glass_178.8_Ab
  • War_AA_202.5_Ab
  • War_Acrylic_22.54_Sv

Example submissions must contain 3 lines of text. Though recommended, it is not mandatory to create example submissions. The scores you achive on example submissions have no effect on your provisional or final ranking.

Final Scoring

The top 10 competitors according to the provisional scores will be invited to the final testing round. The details of the final testing are described in a separate document.

Your solution will be subjected to three tests:

First, your solution will be validated, i.e. we will check if it produces the same output file as your last submission, using the same input files used in this contest. Note that this means that your solution must not be improved further after the provisional submission phase ends. (We are aware that it is not always possible to reproduce the exact same results. E.g. if you do online training then the difference in the training environments may result in different number of iterations, meaning different models. Also you may have no control over random number generation in certain 3rd party libraries. In any case, the results must be statistically similar, and in case of differences you must have a convincing explanation why the same result can not be reproduced.)

Second, your solution will be tested against a new set of coupons.

Third, the resulting output from the steps above will be validated and scored. The final rankings will be based on this score alone.

Competitors who fail to provide their solution as expected will receive a zero score in this final scoring phase, and will not be eligible to win prizes.

Additional Resources

Plenty of relevant papers, reference material and pointers to more information can be downloaded from the Morgoth's Crown microsite resources section.

General Notes

  • This match is NOT rated.
  • Teaming is allowed. Topcoder members are permitted to form teams for this competition. After forming a team, Topcoder members of the same team are permitted to collaborate with other members of their team. To form a team, a Topcoder member may recruit other Topcoder members, and register the team by completing this Topcoder Teaming Form. Each team must declare a Captain. All participants in a team must be registered Topcoder members in good standing. All participants in a team must individually register for this Competition and accept its Terms and Conditions prior to joining the team. Team Captains must apportion prize distribution percentages for each teammate on the Teaming Form. The sum of all prize portions must equal 100%. The minimum permitted size of a team is 1 member, the maximum permitted team size is 5 members. Only team Captains may submit a solution to the Competition. Topcoder members participating in a team will not receive a rating for this Competition. Notwithstanding Topcoder rules and conditions to the contrary, solutions submitted by any Topcoder member who is a member of a team but is not the Captain of the team may be deleted and is ineligible for award. The deadline for forming teams is 11:59pm ET on the 21th day following the date that Registration and Submission opens as shown on the Challenge Details page. Topcoder will prepare a Teaming Agreement for each team that has completed the Topcoder Teaming Form, and distribute it to each member of the team. Teaming Agreements must be electronically signed by each team member to be considered valid. All Teaming Agreements are void, unless electronically signed by all team members by 11:59pm ET of the 28th day following the date that Registration and Submission opens as shown on the Challenge Details page. Any Teaming Agreement received after this period is void. Teaming Agreements may not be changed in any way after signature. The registered teams will be listed in the contest forum thread titled "Registered Teams".
  • Organizations such as companies may compete as one competitor, but they will not be eligible for a prize. However, organizations can place in the standings, if they release their implementation code and methods.
  • Relinquish - Topcoder is allowing registered competitors or teams to "relinquish". Relinquishing means the member will compete, and we will score their solution, but they will not be eligible for a prize. This is to allow SILMARILS peformers to compete. Once a person or team relinquishes, we post their name to a forum thread labeled "Relinquished Competitors". Relinquishers must submit their implementation code and methods to maintain leaderboard status.
  • In this match you may use any programming language and libraries, including commercial solutions, provided Topcoder is able to run it free of any charge. You may use open source languages and libraries provided they are equally free for your use, use by another competitor, or use by the client. If your solution requires licenses, you must have these licenses and be able to legally install them in a testing VM (see "Requirements to Win a Prize" section). Submissions will be deleted/destroyed after they are confirmed. Topcoder will not purchase licenses to run your code. Prior to submission, please make absolutely sure your submission can be run by Topcoder free of cost, and with all necessary licenses pre-installed in your solution. Topcoder is not required to contact submitters for additional instructions if the code does not run. If we are unable to run your solution due to license problems, including any requirement to download a license, your submission might be rejected. Be sure to contact us right away if you have concerns about this requirement.
  • If your solution includes licensed software (e.g. commercial software, open source software, etc), you must include the full license agreements with your submission. Include your licenses in a folder labeled "Licenses". Within the same folder, include a text file labeled README.txt that explains the purpose of each licensed software package as it is used in your solution.
  • External data sets and pre-trained models are allowed for use in the competition provided the following are satisfied:
    • The external data and pre-trained models are unencumbered with legal restrictions that conflict with its use in the competition.
    • The data source or data used to train the pre-trained models is defined in the submission description.
  • Use the match forum to ask general questions or report problems, but please do not post comments and questions that reveal information about possible solution techniques.

Requirements to Win a Prize

Progress prizes

To encourage early participation bonus prizes will be awarded to contestants who reach a certain threshold after week 2, 4 and 6 of the competition. The threshold for the first such prize is 600,000. Thresholds for the 2nd and 3rd such prizes will be announced later in the contest forums.

Any competitor whose provisional score is above the threshold will get a portion of the prize fund ($1000 at week 2, $500 at week 4 and 6) evenly dispersed between the others who also hit the threshold. To determine these prizes a snapshot of the leaderboard will be taken on exactly 2 weeks, 4 weeks and 6 weeks after the launch of the contest.

Final prizes

In order to receive a final prize, you must do all the following:

Achieve a score in the top 5 according to final test results. See the "Final scoring" section above.

Once the final scores are posted and winners are announced, the prize winner candidates have 7 days to submit a report outlining their final algorithm explaining the logic behind and steps to its approach. You will receive a template that helps creating your final report.

If you place in a prize winning rank but fail to do any of the above, then you will not receive a prize, and it will be awarded to the contestant with the next best performance who did all of the above.

Additional Eligibility

SILMARILS Program Performers and Affiliates will be allowed to participate in this challenge. IARPA's SILMARILS performers, government partners, and their affiliates are welcome to participate in the challenge, but will need to forego the monetary prizes. Winners will still be publicly recognized by IARPA in the final winner announcements based on their performance. Throughout the challenge, Topcoder's online leaderboard will display your rankings and accomplishments, giving you various opportunities to have your work viewed and appreciated by stakeholders from industry, government and academic communities.

 

Definition

    
Class:SpectrumPredictor
Method:getAnswerURL
Parameters:
Returns:String
Method signature:String getAnswerURL()
(be sure your method is public)
    
 

Examples

0)
    
Test case 1

This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2010, TopCoder, Inc. All rights reserved.