In Part 1, I looked at how to generate a single random number or event. This provides an essential toolbox for test case generation, but many marathon matches require more complicated lists or sets of items, and in Part 2 we'll see how to tackle those. I will be using Scruffle, the first-round problem from TCCC 2007, as an example where several of the techniques may be applied. Since this problem did not provide a test case generator, it was a real advantage to be able to write your own. Random seeds Any decent random number library should have a means to explicitly specify the seed. The default varies; in C++ there is a fixed default seed, while in other languages the seed is initialised from some external source of variability, such as the time of day. If you use random numbers in your submission, then explicitly setting a fixed seed will ensure that your program will make the same random decisions every time, making it possible to compare different variants of the program. You can make different variants of your program more comparable by always consuming a fixed number of random numbers in each piece of code, even if this means throwing some of them away. If you don't, then a change in an early piece of code may desynchronise the random number streams in the two versions of the program, thus affecting all random numbers chosen later. In Scruffle, for example, some number of blocks must be made special (obstacles, double-word or triple-word blocks), and the rest each receive a random weight from 0 to 9. Based on this principle, I picked a random weight for every block, even when I knew it would not be used. When I later discovered that I was generating the wrong number of triple-word blocks due to a copy-paste error, I could fix the bug and know that everything else about the generated test cases would be the same. While this isn't so important when generating test cases off-line, if you incorporate randomness into a submission then this would reduce the amount of noise in your results due to randomness. I should clarify what I mean by a “fixed” number of random numbers. The algorithm I gave for generating the standard normal distribution in part 1 checks if the random numbers it generates falls in an acceptable range, and if not, it picks some new numbers. However, as long as this function is always called at the same point in the random number stream, it will generate the same numbers initially, and so it will always consume a fixed amount of the random number stream. The cases you should try to avoid are those where you branch on something that is likely to change between versions of your code, particularly a tuning parameter, and you generate a different number of random numbers in each branch. Random permutations Fortunately, there is a simple and efficient algorithm that gives a uniform distribution. Assume there are N elements to be arranged in an array. Any one of them could be the first element, so pick one at random and swap it to the front. You've now changed the relative order of the remaining elements, but the original order is irrelevant anyway. Now, we are left with N - 1 elements, which we need to permute, and it is not hard to see that all (N - 1)! permutations ought to be equally likely. So we can apply the same procedure recursively until all the elements are permuted. Random subsets
In this case, there are several practical alternatives, with some more applicable to certain situations than others. Pick items to exclude Extend the permutation algorithm Decide for each item It is also possible to pick a random set of K items from N items when N isn't known in advance. In Scruffle, a word could be chosen multiple times from the dictionary, but imagine that this was not case. Then this allows us to simply pull a line at a time out of a dictionary, without doing a first pass to determine the size of the dictionary, or storing the whole dictionary in memory. We keep a list of K chosen items, which at every stage is a valid random sample of the words we have read so far. Let's say we had N-1 words loaded, and we now read an Nth one. With probability K/N, pick this word, and use it to replace one of the previously selected words chosen at random. Proving that every possible subset has equal probability is left as an exercise for the reader. Note that while the list of chosen items is not ordered, it is also not uniformly random. If you want the items in random order, you will need to randomly permute them as described above. Which is right? Conclusions
|
|