BookmarkSubscribeRSS Feed
michaelT
Calcite | Level 5

 

I'm trying to figure out why I'm getting different answers between the original C-implementation of the Mersenne-Twister algorithm and the rand function in SAS.

 

If I run the C-code (from here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html), with a seed of 5489 the first five random numbers generated are:

3499211612      581869302       3890346734      3586334585      545404204

 

This produces unsigned 32-bit integers.  To get something that I can compare with rand('uniform') in SAS I've just divided by 4294967295 = (2^32-1), which gives: 

 

0.814723692,  0.135477004,  0.905791934,  0.83500859, 0.12698681    ("series 1")

 

I've then run the following data step in SAS with the intention of getting the same as series 1:

data A;
call streaminit(5489);       /* set random number seed */
do i = 1 to 5;
   u = rand("Uniform");     /* u ~ U(0,1) */
   output;
end;
run;

 

But this generates the following sequence:

0.204043448,  0.4575133489, 0.3245377645, 0.1522461593, 0.6659437388

 

I'm a bit stumped as to why these are different.  I've looked at the SAS documentation on the rand function already (http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001466748.htm), but it doesn't help.

 

The version of SAS I'm using is 9.3 (32 bit).

 

Very grateful for any suggestions!

Thanks
Michael

5 REPLIES 5
ballardw
Super User

First stop: numeric precision

Second likely: the Seed value usage may vary between the algorithms

And just because the algorithm is the same I dont' think I would expect the exact same series between different implementations.

michaelT
Calcite | Level 5

Thanks ballardw.

 

Re: 1) numerical precision.  Yes, this could well be a cause.  I don't know how SAS handles things like integers.  My guess is that SAS might not use 'integers', as you don't have to declare data types in SAS (as you do in C).  But I'd be a little surprised if it couldn't store integers (which are all that is used through the MT algortihm) without loss of precision.

 

Re: 2) Seed value usage.  Interesting - I think you're suggesting that  SAS might do something with the seed supplied to create a new seed from the seed it's been given?  From what I've read the mersenne twister ideally needs a few iterations to really get going into (pseudo) randomness, so maybe SAS does s/thing to take account of this?

 

If anyone knows any further details about how MT is implemented in SAS, that would be really helpful.  If it is around seed value usage (i.e. 2 above) then that gives me hope that I could recreate, given details of what SAS is doing.  But at the moment it's just a black box where the output doesn't match what I'd like!

ballardw
Super User

My suggestion about precision has more to do with the results of the division. Did you do the division in SAS or some other program?

 

The seed you used in the call streaminit just generates a reproducible stream from the rand function. It does not necessarily start at the position in the sequence of values the code in C would start with.

 

An exercise for the interested reader would be to test all of the 2**31 -1 acceptable values of seed to see if any of them match the series (or somewhat closely) you seem interested in.

This should generate a big data set of values with different streams:

data mtseeds;
   do seed= 1 to 2147483647;
      if seed=1 then do;
         call execute("data mtseeds;call streaminit("||seed||
         ");
          array s {5};
          do i= 1 to 5;
            s[i] = rand('uniform');
          end;
          output;
          drop i;
          run;");
      end;
      else do;
         call execute("data temp;call streaminit("||seed||
         ");
          array s {5};
          do i= 1 to 5;
            s[i] = rand('uniform');
          end;
          output;
          drop i;
          run;
          Proc append base=mtseeds data=temp;run;");

      end;
   end;
run;

First try that doesn't work:

 

 

data mtseeds;
   array s {5};
   do seed= 1 to 2147483647;
      call streaminit(seed);
      do i= 1 to 5;
         s[i] = rand('uniform');
      end;
      output;/* or put a test for your expected 5 values and output if the delta is "small"*/
   end;
   drop i;
run;

 

However since the period of the MT is 2**19937 -1 the odds seem somewhat unlikely.

michaelT
Calcite | Level 5

I just did the division in excel - but given the output from SAS is so far apart from my series 1 I don't think that is an issue.  (I think that I am missing your point here, so apologies).

 

Your suggestion about trying to figure out where SAS is starting from is an interesting one, thanks.  The MT state vector is transformed ("twisted") after each 624 random numbers from it, producing a new MT state vector, which then produces another 624 random numbers and is then twisted again, and so on.  So I guess I could  take the first 624 random numbers produced from SAS and convert into integers (and maybe here rounding is a problem) by multiplying by 2^32 - 1.  And if I could use this state vector as the starting point in the C-code it would at least confirm that the SAS implementation of MT is doing the same thing as in C, given the same MT vector state.  

 

Though even if I did that it still leaves the problem of what SAS is doing at the start with the seed that is different from the C-implementation.  I suspect that is not easy to crack without further information.

 

Anyway, thanks for your thoughts on this.

Rick_SAS
SAS Super FREQ

@ballardw: This DATA step does not generate many independent streams. It creates one long stream from seed=1. See "Random number streams in SAS: How do they work?", which includes the statement "It does not make sense to call STREAMINIT multiple times within the same DATA step; after the RAND function is called, subsequent calls to STREAMINIT are ignored. "

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 738 views
  • 2 likes
  • 3 in conversation