turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- random numbers generated by rand function inconsis...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-31-2017 06:30 AM

I'm trying to figure out why I'm getting different answers between the original C-implementation of the Mersenne-Twister algorithm and the rand function in SAS.

If I run the C-code (from here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html), with a seed of **5489** the first five random numbers generated are:

3499211612 581869302 3890346734 3586334585 545404204

This produces unsigned 32-bit integers. To get something that I can compare with rand('uniform') in SAS I've just divided by 4294967295 = (2^32-1), which gives:

0.814723692, 0.135477004, 0.905791934, 0.83500859, 0.12698681 ("series 1")

I've then run the following data step in SAS with the intention of getting the same as series 1:

```
data A;
call streaminit(5489); /* set random number seed */
do i = 1 to 5;
u = rand("Uniform"); /* u ~ U(0,1) */
output;
end;
run;
```

But this generates the following sequence:

0.204043448, 0.4575133489, 0.3245377645, 0.1522461593, 0.6659437388

I'm a bit stumped as to why these are different. I've looked at the SAS documentation on the rand function already (http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001466748.htm), but it doesn't help.

The version of SAS I'm using is 9.3 (32 bit).

Very grateful for any suggestions!

Thanks

Michael

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to michaelT

01-31-2017 10:10 AM

First stop: numeric precision

Second likely: the Seed value usage may vary between the algorithms

And just because the algorithm is the same I dont' think I would expect the exact same series between different implementations.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

01-31-2017 10:53 AM

Thanks ballardw.

Re: 1) numerical precision. Yes, this could well be a cause. I don't know how SAS handles things like integers. My guess is that SAS might not use 'integers', as you don't have to declare data types in SAS (as you do in C). But I'd be a little surprised if it couldn't store integers (which are all that is used through the MT algortihm) without loss of precision.

Re: 2) Seed value usage. Interesting - I think you're suggesting that SAS might do something with the seed supplied to create a new seed from the seed it's been given? From what I've read the mersenne twister ideally needs a few iterations to really get going into (pseudo) randomness, so maybe SAS does s/thing to take account of this?

If anyone knows any further details about how MT is implemented in SAS, that would be really helpful. If it is around seed value usage (i.e. 2 above) then that gives me hope that I could recreate, given details of what SAS is doing. But at the moment it's just a black box where the output doesn't match what I'd like!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to michaelT

01-31-2017 11:30 AM - edited 01-31-2017 02:03 PM

My suggestion about precision has more to do with the results of the division. Did you do the division in SAS or some other program?

The seed you used in the call streaminit just generates a reproducible stream from the rand function. It does not necessarily start at the position in the sequence of values the code in C would start with.

An exercise for the interested reader would be to test all of the 2**31 -1 acceptable values of seed to see if any of them match the series (or somewhat closely) you seem interested in.

This should generate a big data set of values with different streams:

data mtseeds; do seed= 1 to 2147483647; if seed=1 then do; call execute("data mtseeds;call streaminit("||seed|| "); array s {5}; do i= 1 to 5; s[i] = rand('uniform'); end; output; drop i; run;"); end; else do; call execute("data temp;call streaminit("||seed|| "); array s {5}; do i= 1 to 5; s[i] = rand('uniform'); end; output; drop i; run; Proc append base=mtseeds data=temp;run;"); end; end; run;

First try that doesn't work:

data mtseeds; array s {5}; do seed= 1 to 2147483647; call streaminit(seed); do i= 1 to 5; s[i] = rand('uniform'); end; output;/* or put a test for your expected 5 values and output if the delta is "small"*/ end; drop i; run;

However since the period of the MT is 2**19937 -1 the odds seem somewhat unlikely.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

01-31-2017 12:14 PM

I just did the division in excel - but given the output from SAS is so far apart from my series 1 I don't think that is an issue. (I think that I am missing your point here, so apologies).

Your suggestion about trying to figure out where SAS is starting from is an interesting one, thanks. The MT state vector is transformed ("twisted") after each 624 random numbers from it, producing a new MT state vector, which then produces another 624 random numbers and is then twisted again, and so on. So I guess I could take the first 624 random numbers produced from SAS and convert into integers (and maybe here rounding is a problem) by multiplying by 2^32 - 1. And if I could use this state vector as the starting point in the C-code it would at least confirm that the SAS implementation of MT is doing the same thing as in C, given the same MT vector state.

Though even if I did that it still leaves the problem of what SAS is doing at the start with the seed that is different from the C-implementation. I suspect that is not easy to crack without further information.

Anyway, thanks for your thoughts on this.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

01-31-2017 01:47 PM

@ballardw: This DATA step does not generate many independent streams. It creates one long stream from seed=1. See "Random number streams in SAS: How do they work?", which includes the statement "It does not make sense to call STREAMINIT multiple times within the same DATA step; after the RAND function is called, subsequent calls to STREAMINIT are ignored. "