Pseudorandom number generator

From Free net encyclopedia

A pseudorandom number generator (PRNG) is an algorithm that generates a sequence of numbers which are not truly random. The outputs of pseudorandom number generators only approximate some of the properties of random numbers. Although truly random numbers are believed to be generatable using hardware random number generators, pseudo-random numbers are important in simulations (rg, of physical systems with the Monte Carlo method), and are central in both the theory and practice of cryptography. Careful mathematical analysis is required to have any confidence the algorithm generates numbers that are sufficiently "random" to suit the intended use; Robert R. Coveyou of Oak Ridge National Laboratory once titled an article, "The generation of random numbers is too important to be left to chance."1 John von Neumann put essentially the same idea in a slightly different way, "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin."2

Most pseudo-random algorithms produce sequences which are uniformly distributed by any of several tests. Common classes of these algorithms are linear congruential generators, lagged Fibonacci generators, linear feedback shift registers and generalised feedback shift registers. Recent instances of pseudo-random algorithms include Blum Blum Shub, Fortuna, and the Mersenne twister.

Contents

Inherent nonrandomness

Because any PRNG run on a deterministic computer (contrast quantum computer) is necessarily a deterministic algorithm, its output will inevitably have one property that a true random sequence can never have: guaranteed periodicity. It is certain that if such a generator uses only a fixed amount of memory that, given a sufficient number of iterations, the generator will revisit a previous internal state, after which it will repeat forever. A generator that isn't periodic can be designed, but its memory requirements would grow without limit as it ran. In addition, a PRNG can be started from an arbitrary starting point, or seed state, and will always produce an identical sequence from that point on.

The unacceptable significance of this periodicity can sometimes be avoided in practice. The length of the maximum period doubles with each bit of added memory, and it is thus easy to build PRNGs with periods so long that no computer could complete a single cycle in the expected lifetime of the universe. It is an open question, and one central to the theory and practice of cryptography, whether there is any way to distinguish the output of a well-designed PRNG from a truly random sequence (ie, white noise) without knowing, for instance, the algorithm and the seed with which it was initialized. Most cryptography relies on the assumption that it is infeasible to distinguish a suitable PRNG from a random sequence; the simplest example of this dependency is a stream cipher, which (most often) works by taking the exclusive or of the secret message with the output of a PRNG. The design of such PRNGs is extremely difficult, and most programs use much simpler PRNGs, even some cryptographic systems which should certainly not.

The algorithm of R P Brent can be used to determine the period length of determinstic generator algorithms; the result may surprise, but can be used to assist in evaluating the acceptability of a generator algorithm to a particular use.

Problems with deterministic generators

In practice, many common PRNGs exhibit artifacts which cause them to fail statistically significant tests. These include, but are certainly not limited to:

  • Shorter than expected periods for some seed states (ie, not a full period)
  • Poor dimensional distribution
  • Successive values interdependent
  • Some bits being 'more random' than others
  • Lack of uniformity

Defects exhibited by a flawed PRNG may range from unnoticeable to completely absurd. The RANDU random number algorithm used for decades on mainframe computers was flawed, and much research work of that time is less reliable than it should have been as a result. Sometimes, but not always, modeling problems are noticed and traced to inadequate generators, which in turn sometimes leads to improvements in PRNGs.

Early approaches

An early computer-based PRNG was suggested by John von Neumann in 1946; it is known as the middle-square method. The method was very simple: take any given number, square it, and remove the middle digits of the resulting number as your "random number", then use it as the seed for the next iteration. For example, squaring the number "1111" would result in "1234321", which might be written as "01234321", an 8-digit number being the square of a 4-digit number, providing "2343" as the "random" number. Repeating process would give you "4896" as the next result, and so on. Von Neumann used numbers of 10 digits in length, but the process was the same.

The problem with the "middle square" method is that even the best sequences must eventually repeat themselves, some doing so very quickly. Some sequences, such as "0000", will repeat quite promptly. Von Neumann was aware of such things, but for his purposes he found the approach sufficient, and its errors easy to detect (he was worried that mathematical "fixes" would simply hide the errors rather than get rid of them). On the ENIAC computer which he was using, the "middle square" method generated numbers at a rate some two hundred times more quickly than reading numbers from punch cards. In Von Neumann's view, hardware random number generators were questionable, because if they did not record the numbers generated, they could not later be tested for errors. If they did record their numbers, they would exhaust computer memories of the time, and the computer's ability to read and write numbers. If the numbers were written to cards, they would take much longer to write and read. The middle-square method has been supplanted for most purposes by more elaborate generators.

Mersenne twister

The 1997 recent invention of the Mersenne twister algorithm, by Makoto Matsumoto and Takuji Nishimura, avoids most of the problems with earlier generators. It has the colossal period of 219937-1 iterations (probably more than the number of computations which can be performed for the entire future existence of the universe), is proven to be equidistributed in (up to) 623 dimensions (for 32-bit values), and runs faster than all but the least statistically desirable generators. It is now increasingly becoming the "random number generator of choice" for statistical simulations and generative modeling.

However, it is possible to efficiently analyze the output of the Mersenne Twister algorithm and to recognize the numbers as being non-random (as with the Berlekamp-Massey algorithm or an extension from it, such as the Reed-Sloane algorithm). At this time this has had no known negative operational effects save making the Mersenne Twister unusable as a cryptographically secure PRNG.

A pen-and-paper method

The decimal expansion of reciprocal of the form 1/q, for "suitable" values of q provides an easily-implemented source of (pseudo-)random numbers [1]. Matthews showed that a sufficient condition for the method to produce a stream of random numbers of length q - 1 is that q = 2S + 1 where S is a Sophie Germain prime, and both S and 2S + 1 are of the form 3, 9 or 11 mod 20. Thus “suitable” prime numbers q are 7, 23, 47, 59, 167, 179, etc (corresponding to S = 3, 11, 23, 29, 83, 89, etc.). The result is a stream of length q - 1 digits (including leading zeros). So, for example, using q = 23 generates the random digits 0,4,3,4,7,8,2,6.....3,9,1,3


Cryptographically secure pseudorandom number generators

Main article: Cryptographically secure pseudo-random number generator

A PRNG suitable for cryptographic applications is called a cryptographically secure PRNG (CSPRNG). The chief difference between a PRNG and a CSPRNG can be summed up simply: a CSPRNG should appear indistinguishable from random on any examination, whereas normally a PRNG is only required to appear random to standard statistical tests. However, there are generator designs which meet this criterion, but which are not cryptographically strong.

Some classes of CSPRNGs include the following:

  • Stream ciphers or block ciphers running in counter or output feedback mode.
  • Special designs with a security proof. For example Blum Blum Shub has a strong conditional security proof (some internal values must remain secret). But it is slow, too slow for many applications.
  • PRNGs that have been designed specifically to be cryptographically secure. One example is ISAAC, which is fast and whose security recommendations feature, among others, a very large expected cycle time.

Evaluation criteria example

The German Institute for Security in Information Technology has established a four part criteria for quality of deterministic random number generators. They are summarized here:

  • K1 -- A sequence of random numbers with a high probablility of containing no identical consecutive elements.
  • K2 -- A sequence of numbers which is indistinguishable from 'true random' numbers according to specified statistical tests. The tests are the monobit test (equal numbers of ones and zeros in the sequence), poker test (a special instance of the chi-squared test), runs test (counts the frequency of runs of various lengths), longruns test (checks whether there exists any run of length 34 or greater in 20 000 bits of the sequence) -- both from BSI2 (AIS 20, v. 1, 1999) and FIPS (140-1, 1994), and the autocorrelation test. In essence, these requirements are a test of how well a bit sequence: has zeros and ones equally often; after a sequence of n zeros (or ones), the next bit a one (or zero) with probablility one-half; and any selected subsequence contains no information about the next element(s) in the sequence.
  • K3 -- It should be impossible for any attacker (for all practical purposes) to calculate, or otherwise guess, from any given sub-sequence, any previous or future values in the sequence, nor any inner state of the generator.
  • K4 -- It should be impossible, for all practical purposes, for an attacker to calculate, or guess from an inner state of the generator, any previous numbers in the sequence or any previous inner generator states.

For cryptographic applications, only generators meeting the K4 standard are really acceptable.

See also

Notes

1 Peterson. Ivars. The Jungles of Randomness: A Mathematical Safari. Wiley, NY, 1998. (pp. 178) ISBN 0-471-16449-6

2 "Various techniques used in connection with random digits", Applied Mathematics Series, no. 12, 36-38 (1951).

References

  • Donald Knuth. The Art of Computer Programming, Volume 2: Seminumerical Algorithms, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89684-2. Chapter 3, pp.1–193. Extensive coverage of statistical tests for non-randomness.
  • R. Matthews Maximally Periodic Reciprocals Bulletin of the Institute of Mathematics and its Applications 28 147-148 1992
  • J. Viega, Practical Random Number Generation in Software, in Proc. 19th Annual Computer Security Applications Conference, Dec. 2003.
  • John von Neumann, "Various techniques used in connection with random digits," in A.S. Householder, G.E. Forsythe, and H.H. Germond, eds., Monte Carlo Method, National Bureau of Standards Applied Mathematics Series, 12 (Washington, D.C.: U.S. Government Printing Office, 1951): 36-38.

External links

de:Pseudozufallszahlengenerator eo:Vikipedio:Projekto matematiko/Pseŭdohazarda nombra generilo fr:Générateur de nombres pseudo-aléatoires it:Generatore di numeri pseudo-casuali ja:擬似乱数 pl:Generator liczb pseudolosowych ru:Генератор псевдослучайных чисел fi:Näennäissatunnaislukugeneraattori sv:Pseudoslumptalsgenerator tr:Sözderastsal sayı üreteci