| ||Robonaut 2 Tool Manipulation Contest
The best 5 performers in this contest according to system test results will receive the following prizes:
- place: $5000
- place: $2500
- place: $1500
- place: $750
- place: $250
Requirements to Win a Prize
Achieve a score in the top 5, according to system test results. See the scoring section below.
Within 7 days from the announcement of the challenge winners, submit a complete report at least 2 pages long outlining your final algorithm, explaining the logic behind and steps to its approach, and describing how to install any required libraries and run it. The required content appears in the report section below.
If you place in the top 5 but fail to do any of the above, then you will not receive a prize, and it will be awarded to the contestant with the next best performance who did all of the above.
Robonaut 2 ("R2"), a humanoid robot that operates both on Earth and on the International Space Station, commonly uses tools. For example, it manages inventory using an RFID reader and fastens bolts with a drill. In order to use a tool, R2 relies on an algorithm to determine a 3D representation of the tool. The algorithm works with the robot's control system and allows R2 to create a plan for grasping objects and completing its tasks.
There exist several algorithms that could be used to determine a 3D representation of the tool. However, the robot employs an older, less capable set of vision sensors, due to its space heritage and having been exposed to high levels of environmental radiation over time. Many existing algorithms assume that the vision data being used is of relatively high resolution, detail, and quality, and such algorithms are not effective when used with the grade of vision data available to R2. As a result, the R2 team needs you to create vision algorithms for determining the 3D representation of different tools that will be effective with noisy, stereo vision data.
The following training package can be downloaded here. The package contains the following:
- A string-based mesh file containing a 3D wire-frame model of each tool. The mesh file is in ply format. More information about ply format can be found here
- 88 stereo image pairs of the tools. 176 images in total.
- For each stereo pair the 3D representation of the object in each stereo pair as a 3-element translation vector and 4-element rotational quaternion.
An additional tool included in provisional and system testing, and a few sample images (low resolution) are available here.
A visualization tool is available and can be downloaded here.
Note that you are encouraged to use the 3D model in your solution as the center point of the object is determined by the 3D model.
The 3D position at (0,0,0) is defined as the mid point between the focal points of the left and right cameras. The cameras are looking in the Z-axis direction with the X-axis going horizontal and the Y-axis vertical. The units used for position is in millimetres (mm).
An example stereo image pair rendered with the visualization tool can be seen below. The green points represents the vertices of the 3D model transformed to the ground truth location. The red points represents an example of the output of an algorithm.
Your task is to implement trainingModel, trainingPair, doneTraining and testingPair methods, whose signatures are detailed below.
int leftImage and int rightImage contains the unsigned 24 bit image data. The data of each pixel is a single number calculated as 2^16 * Red + 2^8 * Green + Blue, where Red, Green and Blue are 8-bit RGB components of this pixel. The size of the image is 1600 by 1200 pixels. Let x be the column and y be the row of a pixel (top-left = 0,0), then the pixel value can be found at index [x+y*1600] of the image arrays.
The first 3 elements of double groundTruth contain a 3-element translation vector, and the next 4 contain a 4-element Quaternion, which together give the 3D representation of the object in each stereo image pair. Some objects have two ground truths due to object symmetry, those objects will have a groundTruth with length 14 (First ground truth position in the first 7 elements and the second ground truth in the last 7 elements).
The trainingModel method will be called first and will be fed the string-based ply formatted mesh file containing a 3D model of the tool for that test. The more your algorithm relies on the model files and the fewer training images it uses, the higher your score. See the "Testing and Scoring" section below. Next, your trainingPair training method will be called multiple times. You can use this method to train your algorithm on the pairs of stereo images of the tools. If you return 1 from trainingModel, your algorithm with not receive any image pairs. Similarly, if your trainingPair method returns 1, then no more image data will be passed to your algorithm, and the testing phase will begin. This method call defines the tool that will be used throughout the remainder of the test run.
Once all training images have been supplied, or your trainingModel or trainingPair method has returned 1 to end the reception of training data, doneTraining will be called. This will signal that your solution has concluded receiving training data, and--if desired-- take any action necessary to prepare to receive test data.
Finally, your testingPair method will be called for each testing image in the test. The array you return should contain exactly 7 values. Each element in your return should contain the following information:
- Xe : estimated X-coordinate of the translation vector
- Ye : estimated Y-coordinate of the translation vector
- Ze : estimated Z-coordinate of the translation vector
- qre : estimated R-element of the Quaternion
- qie : estimated I-element of the Quaternion
- qje : estimated J-element of the Quaternion
- qke : estimated K-element of the Quaternion
The source code of the visualization tool that you can download here contains a class called Transform. The class contains several useful methods that you can freely use.
- double transform3Dto2D(double x, double y, double z). The transform3Dto2D method takes as input a (x,y,z) 3D position and returns an array that contains the 2D pixel positions for the left and right images. Let R be the returned array. The pixel coordinate of the given 3D point on the left image will be (R, R) and on the right image at (R, R). The code is commented and explain each step of the process.
- double rotate(double x, double y, double z, double qr, double qi, double qj, double qk). The rotate method rotates a given (x,y,z) 3D position with the given Quaternion (qr, qi, qj, qk). The method returns a 3-element array that contains the rotated point in 3D space.
Testing and Scoring
There are 4 example tests, 2 provisional tests and 3 system tests.
The breakdown of example, provisional and system tests can be seen in the table below. The tools used for provisional and system tests may or may not overlap with the example tools, this information is kept hidden.
Test | Tool | Training Pairs | Testing Pairs | Total Pairs
Example 1 | Drill | 6 | 14 | 20
Example 2 | EVA Handrail | 6 | 10 | 16
Example 3 | RFID Reader | 7 | 23 | 30
Example 4 | Softbox | 7 | 15 | 22
Provisional 1 | HIDDEN | 2 | 8 | 10
Provisional 2 | HIDDEN | 3 | 20 | 23
System 1 | HIDDEN | 3 | 7 | 10
System 2 | HIDDEN | 4 | 20 | 24
System 3 | HIDDEN | 6 | 12 | 18
Your algorithm's performance will be quantified as follows (Please have a look at the source code of the visualizer for the exact implementation details of the scoring code):
- Xo, Yo, Zo : Ground truth (X,Y,Z) coordinates of the translation vector.
- Xe, Ye, Ze : Estimated (X,Y,Z) coordinates of the translation vector.
- qro, qio, qjo, qko : Ground truth (qr,qi,qj,qk) elements of the Quaternion.
- qre, qie, qje, qke : Estimated (qr,qi,qj,qk) elements of the Quaternion.
- T = sqrt((Xo - Xe)^2 + (Yo - Ye)^2 + (Zo - Ze)^2) : The positional error.
- GTlen = sqrt(Xo^2 + Yo^2 + Zo^2) : The distance to the ground truth position.
- ScoreT = max(0, 1 - (10*T / GTlen)^2) : Translation score, to be within 10 percent of the distance to the ground truth position.
- q = (qro, qio, qjo, qko) * (qre, qie, qje, qke)^-1 : Quaternion multiplication with the inverse of the estimated Quaternion.
- A = 1 - abs(q3) : Angular error of the rotation. q3 is the qr part of Quaternion q.
- MaxAngleError = 1 - cos(10*PI/180) : Maximum allowed angular error.
- ScoreR = max(0, 1 - (A / MaxAngleError)^2) : Rotational score, to be within 10 degrees of the ground truth rotation.
- ScorePair = 1000000 * ScoreR * ScoreT / (1 + 0.1 * TrainingImagesUsed) : Score for an image pair.
Your overall score for a single test case is the mean (average) of all ScorePair scores. You can see these scores for example test cases when you make example test submissions. If your solution fails to produce a proper return value, your score for this test case will be 0.
The overall score on a set of test cases is the mean (average) of scores on single test cases from the set. The match standings displays overall scores on provisional tests for all competitors who have made at least 1 full test submission. The winners are competitors with the highest overall scores on system tests.
Special rules and conditions
- This match is rated.
- The allowed programming languages are C++, Java, C#, Python and VB.
- You can include open source code in your submission. Open Source libraries under the BSD, or GPL or GPL2 license will be accepted. Other open source licenses could be accepted too. Just be sure to ask us.
Your report must be at least 2 pages long, contain at least the following sections, and use the section names below.
- First Name
- Last Name
- Topcoder Handle
- Email Address
- Final Code Submission File Name
Please describe your algorithm so that we can understand it even before seeing your code. Use line references to refer to specific portions of your code. This section must contain at least the following:
- Approaches Considered
- Approach Ultimately Chosen
- Steps to Approach Ultimately Chosen, Including References to Specific Lines of Code
- Advantages and Disadvantages of the Approach Chosen
- Comments on Libraries
- Special Guidance Given to Algorithm Based on Training
- Potential Algorithm Improvements