- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to randomize the numbers in my dataset for a few different variables. The macro that I am using to do this is currently:
%macro RandBetween(min, max);
(&min + ((1+&max-&min)*abs(rand("normal"))))
%mend;
I think my understanding of a normal distribution isn't correct since %RandBetween(0,1) seems to be returning numbers greater than 1 as well. This obviously will return an incorrect 'percentage' variable since I want that to have a maximum value of 1 (100%). Consequently, other variables that are generated like this:
avg_opioid = %RandBetween(0,&max_avg_opioid.);
also seem to have values greater than their max (presumably because the rand("normal") function is returning numbers greater than 1. I have also tried rand("normal",0.5,0.5) but that doesn't seem to help either. At this point, I think my understanding of the normal distribution may be skewed.
How do I go about limiting the return of rand("normal") to a min and max of 0 and 1 respectively?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The "problem" with the normal distribution is that it does not have defined lower/upper bounds. Theoretically, any value is possible, only with diminishing probability the farther away from the mean it is.
The more data points you have, the higher the probability of exceeding any wanted minimum/maximum.
So, when running this:
data test;
do i = 1 to 1000;
x1 = rand('normal',.5,.1);
output;
end;
run;
I stayed well between 0 and 1, but with 10 million iterations I exceeded the bounds.
Are you sure you don't want uniform distribution?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
After the macro is resolved, you get
(0 + ((1+1-0)*abs(rand("normal"))))
so you'll get numbers between 0 and 2.
Change your macro to this:
%macro RandBetween(min, max);
(&min + ((&max-&min)*abs(rand("normal"))))
%mend;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Are you sure you want to create a 'percentage variable' using the normail distribution? A N(0,1) distribution is not restricted to values between 0 and 1. It is a normal distribution with mean 0 and variance 1 ..
If you do not actually need the normail, then simply do this to get a value between 0 and 1
data _null_;
x=rand('uniform');
put x;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately, for the purposes of what I am trying to do with the data, I need it to be a normal distribution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The "problem" with the normal distribution is that it does not have defined lower/upper bounds. Theoretically, any value is possible, only with diminishing probability the farther away from the mean it is.
The more data points you have, the higher the probability of exceeding any wanted minimum/maximum.
So, when running this:
data test;
do i = 1 to 1000;
x1 = rand('normal',.5,.1);
output;
end;
run;
I stayed well between 0 and 1, but with 10 million iterations I exceeded the bounds.
Are you sure you don't want uniform distribution?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Awesome, this was exactly what I needed. It understand that the asymptotic nature of the bell curve doesn't allow for bounds but that is okay since 1 or 2 observations over 100% out of 100,000 is reasonable. Thanks!