|Robot Detection and Tracking of Objects in Industrial Settings
The best 3 performers in this contest according to system test results will receive the following prizes:
- place: $3000
- place: $2000
- place: $1000
This problem builds upon the in-progress RobotVisionTracker match, by using a larger dataset that covers a wider variety of potential scenarios,
which will provide better overall results. The training data made available for the first contest is included as part of this match, and can and
should be used here. Contestants are welcome and encouraged to participate in both matches, and may freely use some or all of the same code for
each, as they feel appropriate. (Take caution, however, that as this match adds data not present in the original match, there is no guarantee that
the additional training data provided here will be useful for the former.)
Firm A is building a next generation robotics platform that will change the game in field service operations
including asset inspection and repair. Firm A has defined a host of high value use cases and applications across
industry that will support field engineers and other industrial workers be more productive and, more importantly,
perform their jobs safely.
For one example high value use
case, the company would like for a robot to detect and track a freight railcar brake release handle, the object of
interest (OOI), so that the robot can grasp the handle.
Your task is to develop an algorithm that can detect and track the OOI in video frames. The OOI is typically made of 0.5 inch round steel rod, bent to form a handle. Examples of the varieties of the OOI appear below. The point marked blue is the point to be identified if present in a frame. More details follow.
Your algorithm will receive the following input:
- Random samples of contiguous frames of videos shot in stereo ("Training Data"). Some frames will contain the OOI, others will not, and the samples will have 10 frames each.
- The Training Data consists of videos containing frames, all shot in 640x480 resolution with the same camera.
- Stereo camera calibration was performed with OpenCV. More information on camera calibration can be found here. The left and right camera calibration parameters can be downloaded here and here.
- If the OOI appears in the sample, then it will be marked in every frame as a point (x,y) according to the following convention, which defines the "ground truth" of the OOI's presence and location. As seen from the convention, the OOI is marked in 3 scenarios:
- When the OOI is in direct line of sight.
- When it is occluded by something in front of it.
- When the brake lever itself is occluding the point.
Please do look at the convention PDF in the link above.
The Training Data can be downloaded here and
Your task is to implement training and testing methods, whose signatures are detailed below.
int imageDataLeft and int imageDataRight contains the unsigned 24 bit image data. The data of each pixel is a single number calculated as 2^16 * Red + 2^8 * Green + Blue, where Red, Green and Blue are 8-bit RGB components of this pixel. The size of the image is 640 by 480 pixels. Let x be the column and y be the row of a pixel, then the pixel value can be found at index [x+y*640] of the imageData array.
Firstly, your training method will be called multiple times. You can use this method to train your algorithm on the supplied training data. If your training method returns the value 1, then no more data will be passed to your algorithm, and the testing phase will begin. Data for each video frame of multiple videos will be sequentially passed to your method. All the video frames available for each training video will be passed to your algorithm. The number of frames for each video may differ. The ground-truth location for the OOI for each frame will also be provided in leftX, leftY, rightX and rightY. A negative value indicates that the OOI was not detected in that frame.
Once all training images have been supplied, doneTraining() will be called. This will signal that your solution should do any further processing based on the full set of training data.
Finally, your testing method will be called 50 times. The first 10 times it will contain contiguous frames from a video. The next 10 times it will be contiguous from a second video. And so on... The array you return should contain exactly 4 values. Returning any point outside the bounds of the image will indicate that you did not detect the OOI in the image. Each element in your return should contain the following information:
- leftX - estimated x-coordinate for the point in the left image
- leftY - estimated y-coordinate for the point in the left image
- rightX - estimated x-coordinate for the point in the right image
- rightY - estimated y-coordinate for the point in the right image
The videos used for testing as well as the starting frame within the video will be selected randomly, so it is possible to have repetitions or intersection of frames during the testing phase.
Testing and Scoring
There are 1 example tests, 5 provisional tests and at least 10 system tests.
177 videos have been split into three sets: 34, 68 and 75. The first 34 videos are available for download for local testing and example testing.
- Example tests: 25 (out of the set of 34) videos used for training, 9 for testing.
- Provisional tests: 34 videos used for training, 68 for testing.
- System tests: 34 videos used for training, 75 for testing.
Your algorithm's performance will be quantified as follows.
xr, yr: True x and y-coordinates for the OOI in the image (in units of pixels)
xe, ye: Estimated x and y-coordinates for the OOI in the image (in units of pixels)
dr = sqrt( (xe - xr)*(xe - xr) + (ye - yr)*(ye - yr) )
leftR[DIST] = percentage of left image frames whose dr <= DIST pixels
rightR[DIST] = percentage of right image frames whose dr <= DIST pixels
Note: In case of the OOI not being visible in the frame, the detection will be counted as correct if your algorithm correctly detects that the OOI is not in the frame.
T = total CPU processing time for all testing frames in seconds
AccuracyScore = 10000.0 * (50.0*(leftR+rightR) + 35.0*(leftR+rightR) + 15.0*(leftR+rightR))
TimeMultiplier = 1.0 (if T <= 3.33), 1.3536 - 0.2939 * Ln(T) (if 3.33 < T <= 100.0), 0.0 (if T > 100.0)
Score = AccuracyScore * (1.0 + TimeMultiplier)
You can see these scores for example test cases when you make example test submissions. If your solution fails to produce a proper return value, your score for this test case will be 0.
The overall score on a set of test cases is the arithmetic average of scores on single test cases from the set. The match standings displays overall scores on provisional tests for all competitors who have made at least 1 full test submission. The winners are competitors with the highest overall scores on system tests.
An offline tester/visualizer tool is available.
Minimum Score Criteria
To be eligible for a prize, your submission needs to attain a minimum score of 700000 in System Testing.
Special rules and conditions
- The allowed programming languages are C++, Java, C# and VB.
- Be sure to see the official rules for details about open source library usage.
- In order to receive the prize money, you will need to fully document your code and explain your algorithm. If any parameters were obtained from the training data set, you will also need to provide the program used to generate these parameters. There is no restriction on the programming language used to generate these training parameters. Note that all this documentation should not be submitted anywhere during the coding phase. Instead, if you win a prize, a TopCoder representative will contact you directly in order to collect this data.
- You may use any external (outside of this competition) source of data to train your solution.