BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MrTh
Obsidian | Level 7

hiya all

In SAS you can generate random numbers from the same distribution using a selected seed at each data step: same distribution + same seed in 2 diff data steps = same serie of random number in 2 data steps.

Now I would like to sample from the same distribution (Normal) from the same seed in different data step but I don't want the same random pick.

is there a trick to have 2 different series of random number from the distribution and the same seed in 2 different data steps?

 

I though of sampling once from the normal distribution and a chosen seed and create a large datafile of random numbers; I could then at each different data steps pick in that list. 

Is there something cleverer i wonder? 

Many thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

You can simulate many (thousands) of samples using one seed. There are dozens of articles on my blog and a more systematic presentation in the book Simulating Data with SAS, but here's one article that contains several important points: "Simulate many samples from a linear regression model (including ANOVA)"

 

 

View solution in original post

8 REPLIES 8
ballardw
Super User

Please explain why using the same seed is important in this case. The "seed" basically sets a position in the stream of random numbers the function generates.

 

I think this does something along the lines of what your are requesting but without knowing how you intend to use this it is hard know if this will suit your need.

 

data work.one;
  seed=15;
  do i= 1 to 10;
   call rannor(seed,x);
   if mod(i,2) = 0 then output;
  end;
run;

data work.two;
  seed=15;
  do i= 1 to 10;
   call rannor(seed,y);
   if mod(i,2) = 1 then output;
  end;
run;

To verify that you are getting numbers from the same series then remove the "if mod() = then" in one or both data sets.

 

MrTh
Obsidian | Level 7

hi Ballardw

thanks for the reply. 

I suppose I could elaorated a bit more.

This is a genetic algorithm in which I will repeat a large number of time the simulation (~1,000) in order to estimates variance and covariances between traits. 

The fact that the seed is the same across all the repetition allows me to avoid creating spurious correlations due to the change of seeds. By keeping the same seed I can sample in my Normal without having to worry above that sort of bias.

Your solution looks like a good alternative to my giant file of random numbers 🙂 I'll give it a go.

Another way i thought of testing would be to have different seeds - bare with me - but taken randomly

 

Anyway I really appreciated your comments

cheers!

 

Rick_SAS
SAS Super FREQ

First, you should read this article about random number streams and how they are controlled by the seed value.

 

Issues like this sometimes come up when SAS users are trying to run a simulation study and decide to wrap the study in a macro that takes a single seed as an argument. Sometimes programmers are trying to use an inefficient macro loop to generate the data, instead of an efficient simulation that only requires a single seed.

 

An efficient answer to your question would take into account the task you are trying to accomplish. Could you say more about what you are trying to do?  

 

Without further details, the best I can offer is to mention that the OUTPUT statement takes an optional argument that specifies the output data set. Thus the following DATA set writes the first 1000 observations to the data set 'A', the next 5000 observations to 'B', and the last 100 observations to 'C':

 

data A B C;
call streaminit(12345);
do i = 1 to 1000;
   x = rand("Normal");
   output A;
end;
do i = 1 to 5000;
   x = rand("Normal");
   output B;
end;
do i = 1 to 100;
   x = rand("Normal");
   output C;
end;
run;

However, there is no intrinsic advantage over having multiple data steps, each with a different random seed.

MrTh
Obsidian | Level 7

Hi Rick_SAS

thanks for the comment.

I gave a bit more details to Ballardw above on the why.

Thanks for the article link - I will definitely study it. 

Your solution is also worth a good try as it would avoid the creation of that big file of random numbers I am using now and that make the code sluggish.

One thing I didn't mention is the fact that I am reproducing in SAS an algorithm done in fortran and where the random number picking was done by the NAG sub routine. I don't much about Fortran hence my SAS effort. 

Many thanks for the comment Rick_SAS

 

Cheers

 

Rick_SAS
SAS Super FREQ

You can simulate many (thousands) of samples using one seed. There are dozens of articles on my blog and a more systematic presentation in the book Simulating Data with SAS, but here's one article that contains several important points: "Simulate many samples from a linear regression model (including ANOVA)"

 

 

hunbes
SAS Employee
Hi Rick,
If I run your data step and after an other with a little changing in the second do loop.
I changed the distribution to Pois.
Why are the C1 and C2 data sets totally different and not only the B2 is changed?

data A1 B1 C1;
call streaminit(12345);
do i = 1 to 1000;
x = rand("Normal");
output A1;
end;
do i = 1 to 5000;
x = rand("Normal");
output B1;
end;
do i = 1 to 100;
x = rand("Normal");
output C1;
end;
run;
data A2 B2 C2;
call streaminit(12345);
do i = 1 to 1000;
x = rand("Normal");
output A2;
end;
do i = 1 to 5000;
x = rand("POIS",0.2);
output B2;
end;
do i = 1 to 100;
x = rand("Normal");
output C2;
end;
run;
proc compare base=c1 compare=c2;
run;

thx,
Berni
Rick_SAS
SAS Super FREQ

Because a random number generator generates a stream of UNIFORM numbers which are then transformed to create nonuniform variates. A call to RAND may use one or more uniform random variates from the underlying random number generator.  In this case, although you are generating 5000 "Normal" in one DATA step and 5000 "Poisson" in another DATA step, the two loop use different numbers of uniform variates to form those nonunifom random values.

 

I discuss this a little bit in my paper "Tips and Techniques for Using the Random-Number Generators in SAS" where I say (p. 2) " distributions (such as the normal) might require multiple uniform variates." The paper also describes how RNGs work and discusses the underlying stream of uniform values.

hunbes
SAS Employee
Thanks for clear words. We will modify our simulation process.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 5737 views
  • 0 likes
  • 4 in conversation