JOIN
Get Time
features   
Generating random numbers with C++ TR1

Author
By bmerry
TopCoder Member

Introduction

In previous articles, I have looked at how to generate various random distributions of one variable, and random collections of variables. Those articles assumed only a random number generator that produced random numbers in some fixed range, such as the C rand() function. In this article, I'll look at a specific random number library, and how it can simplify random number generation.

What is TR1?

The current version of the C++ standard was ratified in 1998. Since then, there has been a lot of work towards defining a new version of the standard, currently with the working title of C++0x. This revision will add assorted new features to the language itself, but in the meantime, a subset of the proposals have been put into an interim standard, known as TR1. TR1 contains no new language syntax, only new additions to the standard libraries. The additions are a mix of various useful things, including smart pointers, hash tables, regular expressions, and, of course, random number generators.

Because there are no changes to the language, TR1 can easily be bolted on to existing compilers. GCC started implementing TR1 with the version 4 series, and the random number generation facilities are included from GCC 4.2. Microsoft also provide beta TR1 support as an extension pack to Visual Studio 2008, although I haven't tried it so I don't know if it includes random number generation.

I should also mention that I am far from being an expert on TR1, and I would encourage the Topcoder community to contribute their knowledge and experience in the forums. What I describe below is merely my experience in using the implementation in GCC 4.2.

Generators, engines and distributions

One of the problems with random number generation in C is that there is only one random source: the rand function. In a marathon match, particularly interactive problems, it is often desirable to be able to reliably reproduce the same sequence of random numbers, even if some parts of your code change. This way, you can separate the effect of the change you made from the effects of randomness. However, with a single random source, adding or removing some steps might have the side effect of changing the position in the sequence of random numbers, which will change the random numbers for the rest of the program execution.

TR1 identifies three abstract concepts related to random number generation: engines, generators and distributions. An engine is a pseudo-random number generator with reproducable behaviour. Several types of engines are defined, including the standard linear congruential generator used by many implementations of rand, as well as the popular Mersenne Twister. Engines can be seeded, copied and saved to file and restored, making it possible to control the interactions between different pieces of code on the random source.

A uniform random number generator is a more abstract concept: it simply has to generate numbers in some range (either integer or floating point) with a uniform distribution. Engines are required to be uniform random number generators, but other source are possible, such as a hardware or operating system random number generator that uses external events as a source of cryptographically secure randomness.

Finally, a distribution is the user-friendly front end to the whole infrastructure. It takes randomness from a generator and formats it into your distribution of choice - be it a uniform distribution with a range you specify, or a normal distribution, or something else.

Let's dive in with an example:

#include <tr1/random>
#include <iostream>
#include <cstdlib>

using namespace std;
using namespace std::tr1;

int main()
{
    variate_generator<mt19937, normal_distribution<> > r(mt19937(time(NULL)), normal_distribution<>(0.5, 2.0));

    for (int i = 0; i < 5; i++)
        cout << r() << endl;
    return 0;
}

Now, what does it all mean? Let's take it one bit at a time. Firstly, the header file is tr1/random, and the classes are all defined in the std::tr1 namespace. The variate_generator template is a handy wrapper class that combines an engine and a distribution into a single function object.

TR1 provides a number of predefined engine templates, which take various parameters controlling the pseudo-random generation. However, it's difficult to remember the magic numbers that make good random number generation, so it also provides typedefs for various common instances. One of these is mt19937, which is the standard Mersenne Twister (the 19937 in the name refers to the period, which is 219937 - 1).

The probability distribution here is the normal distribution. This is also a template (depending on which floating-point type you would like to generate), but it defaults to double and so we do not need to supply any template parameters. The constructor parameters are the mean and standard deviation.

Finally, we put it all to work, outputting five random numbers with a normal distribution.

Multiple distributions

This is fine if we only ever need to generate random numbers from one distribution, but what if we need more? You could just copy-and-paste the declaration and then change it. However, you'll probably notice that unless a second passes between the constructors being called, the engines will be generated with the same seed, and hence will produce the same random numbers. A second attempt may be to construct just one engine, and pass that to the constructors for each variate_generator. Unfortunately, that is no better, since each variate_generator takes a copy of the engine, so they will again be generating the same random numbers. The TR1 specification indicates that the first template parameter can be a reference or pointer to an engine type, which presumably would solve the problem, but GCC 4.2 appears not to support this.

What else can one do in the meantime? Several options spring to mind:

Use one engine to seed others

Most high-quality random number engines have a lot of internal state (mt19937 stores 624 integers), so creating and seeding one can be expensive. This also means that seeding with only a single number is less than ideal. TR1 allows a function object to be used as a seed, which the engine will call as many times as necessary. Here is an example of creating two normal distributions:

mt19937 seeder(time(NULL));
variate_generator<mt19937, normal_distribution<> > r1(mt19937(seeder), normal_distribution<>(0.5, 2.0));
variate_generator<mt19937, normal_distribution<> > r2(mt19937(seeder), normal_distribution<>(0.5, 2.0));

However, I'm not entirely certain that this will not cause some correlation between shifted versions of the two sequences.

Don't use variate_generator

A distribution can be used as a function object, taking an engine as a parameter. Unfortunately, GCC's implementation seems to require certain types of generator (integral or real-valued) depending for some distributions. In particular, normal_distribution doesn't work with an integral generator. The good news is that a real-valued generator seems to work for all the distributions I've tried (I haven't tried them all, however). One can either use one of the real-valued engines provided by TR1, or convert an integral engine into a real-valued generator using variate_generator:

variate_generator<mt19937, uniform_real<> > urng(mt19937(time(NULL)), uniform_real<>());

Conclusions

TR1 solves several major short-comings of the aging C random number generator: it gives explicit control over the state of engines, defines portable engines that yield reproducable behaviour between compilers, and provides a number of discrete and continuous non-uniform distributions. However, there are still some quirks and inconveniences, at least under GCC. It appears that the random number interface in C++0x is still in development, and presumably the GCC developers are striving to improve their implementation, so we may soon see a day when random number generation will be both simple and powerful.