I am trying to randomize the numbers in my dataset for a few different variables. The macro that I am using to do this is currently:
%macro RandBetween(min, max);
(&min + ((1+&max-&min)*abs(rand("normal"))))
%mend;
I think my understanding of a normal distribution isn't correct since %RandBetween(0,1) seems to be returning numbers greater than 1 as well. This obviously will return an incorrect 'percentage' variable since I want that to have a maximum value of 1 (100%). Consequently, other variables that are generated like this:
avg_opioid = %RandBetween(0,&max_avg_opioid.);
also seem to have values greater than their max (presumably because the rand("normal") function is returning numbers greater than 1. I have also tried rand("normal",0.5,0.5) but that doesn't seem to help either. At this point, I think my understanding of the normal distribution may be skewed.
How do I go about limiting the return of rand("normal") to a min and max of 0 and 1 respectively?
The "problem" with the normal distribution is that it does not have defined lower/upper bounds. Theoretically, any value is possible, only with diminishing probability the farther away from the mean it is.
The more data points you have, the higher the probability of exceeding any wanted minimum/maximum.
So, when running this:
data test;
do i = 1 to 1000;
x1 = rand('normal',.5,.1);
output;
end;
run;
I stayed well between 0 and 1, but with 10 million iterations I exceeded the bounds.
Are you sure you don't want uniform distribution?
After the macro is resolved, you get
(0 + ((1+1-0)*abs(rand("normal"))))
so you'll get numbers between 0 and 2.
Change your macro to this:
%macro RandBetween(min, max);
(&min + ((&max-&min)*abs(rand("normal"))))
%mend;
Are you sure you want to create a 'percentage variable' using the normail distribution? A N(0,1) distribution is not restricted to values between 0 and 1. It is a normal distribution with mean 0 and variance 1 ..
If you do not actually need the normail, then simply do this to get a value between 0 and 1
data _null_;
x=rand('uniform');
put x;
run;
Unfortunately, for the purposes of what I am trying to do with the data, I need it to be a normal distribution.
The "problem" with the normal distribution is that it does not have defined lower/upper bounds. Theoretically, any value is possible, only with diminishing probability the farther away from the mean it is.
The more data points you have, the higher the probability of exceeding any wanted minimum/maximum.
So, when running this:
data test;
do i = 1 to 1000;
x1 = rand('normal',.5,.1);
output;
end;
run;
I stayed well between 0 and 1, but with 10 million iterations I exceeded the bounds.
Are you sure you don't want uniform distribution?
Awesome, this was exactly what I needed. It understand that the asymptotic nature of the bell curve doesn't allow for bounds but that is okay since 1 or 2 observations over 100% out of 100,000 is reasonable. Thanks!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.