Hi! I am doing some computer experiments with Gumbel random numbers.
1) According to the books and manuals:
CALL STREAMINIT(2); X= RAND("GUMBEL", MY, SCALE);
An alternative: Xi= MY - SCALE*LOG( -LOG(RAND('UNIFORM')));
These two seem to produce results that are identical.
I have used both to produce a lot of random numbers. Works fine. Then I calculated the Mean, StdDev and a lot of quantiles.
2) An alternative, when producing many random numbers is the following: X= MY - SCALE*LOG(-LOG(Step));
where Step is the index in a do-loop. 5Mi values. Then I calculated the Mean, StdDev and a lot of quantiles.
The values are almost the same as above, with very small differences,
Fine - OR?
When I look at the values generated using 2) in some small intervals e.g. around the median the values look very smooth and nice.
When I do the same using the values produced by 1) the values are not at all that smooth, but vary up and down.
Question: Is this described in any paper? Are the values produced in 2) also random numbers.
I would appreciate som comments and advice on this matter.
Best Regards AndersS
I would expect close values to some extent when driven by a "do loop". The index takes very specific increments, which will limit the resulting output to small changes. That is built into the formula you use.
Look at the behavior of -log(step) by itself. You can see that as your "step" increases the value change of the function result gets smaller:
data junk; do i=1 to 100; y= -log(i); output; end; run; proc sgplot; scatter x=i y=y; run;
I would not, personally, call method 2 "random" in any way. It is determined by the value of 3 variables. Plug in the same 3 values you get the same result, hence not random. It may generate a range of values similar to a specific random number distribution but for any of the traditional uses of random values I wouldn't touch that approach, at least not without considerable addition to the code.
I would expect close values to some extent when driven by a "do loop". The index takes very specific increments, which will limit the resulting output to small changes. That is built into the formula you use.
Look at the behavior of -log(step) by itself. You can see that as your "step" increases the value change of the function result gets smaller:
data junk; do i=1 to 100; y= -log(i); output; end; run; proc sgplot; scatter x=i y=y; run;
I would not, personally, call method 2 "random" in any way. It is determined by the value of 3 variables. Plug in the same 3 values you get the same result, hence not random. It may generate a range of values similar to a specific random number distribution but for any of the traditional uses of random values I wouldn't touch that approach, at least not without considerable addition to the code.
to get random numbers with method 2 you would need something like:
%let n=5000000;
array _x{&n.} _temporary_;
if _n_ = 1 then do step = 1 to &n.;
_X{step} = MY - SCALE*LOG(-LOG(Step));
end;
do step = 1 to &n.;
X = _X{rand("integer", &n.)};
output;
end;
those would look almost OK, as long as n is large, but wouldn't save any significant CPU resources.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.