For more information on the SAS Random Number Generator, see here.
All the SAS RNGs named RANxxx are based on RANUNI and use some transform, inversion, or acceptance/rejection method to generate pseudorandom number streams with various other distributional properties. UNIFORM() is an alias for RANUNI(), and NORMAL() is an alias for RANNOR().
RANUNI() uses a multiplicative linear congruential generator (from SAS docs) where
and then returns
as the uniform random number. When using the SAS random number functions or subroutines, one should specify a SEED in the range of 1 to 2**31-2 to initialize the starting point of the pseudorandom number stream, or use a nonpositive integer (0 or negative) to create the initial seed from the system clock. If a nonpositive number is used, SAS reads the system time and computes the initial seed value using an algorithm equivalent to
SEED = 1e3 * mod(round(1e3 * datetime()), 1e6) + 1;
The period of RANUNI generator is 2**31 - 2. (The SAS documentation says the period is 2**31 - 1 but the sequence repeats after 2**31 - 2 numbers.) To illustrate what this means, examine the following simple RNG.
data random; seed=6; *seed must be between 1 and 30 (i.e. 2**5-2); do _n_=1 to 100; seed = mod( 3*seed, 2**5-1 ); urand = seed/(2**5-1); output; end; run;
The period for this generator is 2**5-2 = 30. It will generate a sequence of 30 numbers and then repeat that sequence over and over again. In the above example:
The next time a pseudorandom number is called for:
With an initial seed of 6, the sequence of updated values generated will be,
18 23 7 21 1 3 9 27 19 26 16 17 20 29 25 13 8 24 10 30 28 22 4 12 5 15 14 11 2 6
and that cycle will continue repeating over and over, for as long as the DATA step continues to run. These numbers are not computed, stored, and looked up when needed, rather they are computed on the fly as requested. But because the process is deterministic we know in what order they will occur. If you change the initial seed to say 22, the sequence will begin at a different place,
4 12 5 15 14 11 2 6 18 23 7 21 1 3 9 27 19 26 16 17 20 29 25 13 8 24 10 30 28 22
but it will be the same ordering of updated seeds (and uniform random numbers). If you look for 4 in the previous sequence (where initial seed was 6), you will see that it is followed by the same sequence of numbers when as here the initial seed was 22, and the numbers "wrap-around" to the beginning of the sequence.
Obviously, this toy RNG is not very useful because it doesn't produce enough unique numbers and the period is too short. However, it illustrates how RANUNI works. As stated above, the period for RANUNI is 2**31-2, or 2,147,483,646. It will generate a sequence of 2**31-2 pseudorandom numbers before it starts repeating.
The RANxxx functions and CALL RANxxx subroutines use the same algorithm, but differ in how the seed is managed. The difference is that with the CALL versions, one can change the seed and the stream of numbers will change, and one can maintain separate number streams with different starting points. With the FUNCTION versions, the starting point in the random number stream is fixed by the first reference to a FUNCTION RNG, and all FUNCTION RNG calls use the same stream of seeds.
In the following data step, seed1 and seed2 are set and never change. If one compares the sequence of r1 and 2 together, they are the same as the sequence of numbers in r3, because the the seed for RANxxx functions is set by the first use of any of the family of functions and all the RANxxxfunctions draw from the same sequence of numbers. However, the CALL RANxxx subroutines can be started at different points in the sequence of numbers. Seed3 and seed4 values change with each call and they can be changed to alter the output of the call (not recommended, because unless one is very careful it is possible to introduce serial correlation into the sequence of numbers or otherwise modify the sequence in unhelpful ways).
data _null_; seed1 = 1; seed2 = 3271985; retain seed3 1 seed4 3271985 ; do _n_ = 1 to 10; r1 = ranuni(seed1); r2 = ranuni(seed2); call ranuni(seed3, r3); call ranuni(seed4, r4); put _all_; end; run;
The newest SAS RNG is RAND (see also RANDGEN) which is available in SAS 9.1 and later. This RNG is based on the Mersenne-Twister algorithm. It is much more complicated than the linear congruential algorithm and has a cycle length of 2**19937-1 according to the SAS documentation. Most SAS users (and I include myself in this group) will have to simply accept what the number theorists have to say about the properties of this RNG as there is no way to generate this many numbers in a lifetime given current computing power.
To select the seed for the RAND function, use the CALL STREAMINIT routine. Do this once before using RAND() to generate any random numbers. If you don't use CALL STREAMINIT or if you specify a non-positive seed, the seed will be set using the system clock.
One could argue that for most of the situations that the average SAS user will encounter, the RANxxx functions or subroutines are adequate.
The period or cycle length is very important; the longer the better. For serious statistical work, simulations, etc., where one is going to simulate a lot of data that needs to be "as random as possible" then one ought to be using the RAND() function which has a longer period and better properties than the RANxxx functions.
Changing the seed when using the CALL RANxxx subroutines could lead to serial correlations or other disturbances in the properties of the RNGs.
It doesn't matter which seed one uses as long as the seed is within the range specified in the documentation and as long as one doesn't use the same seed every time an RNG is initialized. Again, the seed only initializes where one begins in the pseudorandom sequence of numbers, and this sequence is fixed by the specific algorithm used and the initial seed.
No seed needs to be avoided. As long as the seed meets the requirements of the method used it is appropriate.
In order to be able to replicate results if a client or boss asks you to show how you got your results.
The RANxxx function RNGs are the same as the CALL subroutine versions in terms of the stream of random numbers which are produced. The difference is that with the CALL versions, one can change the seed and the stream of numbers will change, and one can maintain separate number streams with different starting points. With the FUNCTION versions, the starting point in the random number stream is fixed by the first reference to a FUNCTION RNG, and all FUNCTION RNG calls use the same stream of seeds.
The SAS community is encouraged to chime in here, but at least one benefit is the ease with which one do "random sampling".
One other aspect which random number generators has proven to be useful is in data masking. They are also useful in providing numerical keys in which can be used for encryption purposes in conjunction with other bit operations (For more information, please look for the function XOR in the SAS documentation) in SAS.
Random number generators might also be helpful to people who might be using Monte Carlo simulations.
Probably. Others are welcome to add neglected topics, corrections, or otherwise reorganize things.
One major thing to note is that while these random number generators are statistically random, they are not cryptographically secured generators [1]
This article was originally published by Djnordlund on sasCommunity.org.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.