Hi! I am doing some computer experiments with Gumbel random numbers.
1) According to the books and manuals:
CALL STREAMINIT(2); X= RAND("GUMBEL", MY, SCALE);
An alternative: Xi= MY - SCALE*LOG( -LOG(RAND('UNIFORM')));
These two seem to produce results that are identical.
I have used both to produce a lot of random numbers. Works fine. Then I calculated the Mean, StdDev and a lot of quantiles.
2) An alternative, when producing many random numbers is the following: X= MY - SCALE*LOG(-LOG(Step));
where Step is the index in a do-loop. 5Mi values. Then I calculated the Mean, StdDev and a lot of quantiles.
The values are almost the same as above, with very small differences,
Fine - OR?
When I look at the values generated using 2) in some small intervals e.g. around the median the values look very smooth and nice.
When I do the same using the values produced by 1) the values are not at all that smooth, but vary up and down.
Question: Is this described in any paper? Are the values produced in 2) also random numbers.
I would appreciate some comments and advice on this matter.
Best Regards AndersS
Your first two methods, using the RAND function with the GUMBEL distribution and using RAND with the UNIFORM distribution and using an inverse transform on the cumulative density function for the Gumbel, should return results that are very close - if not identical.
The third method, generating observations from the Gumbel using a pre-defined set of probabilities (the values from your DO loop index) will not be considered by most to be a random sample from the Gumbel. That method will give you a very nicely shaped Gumbel curve, but there is no randomness in the selection of your probabilities.
There is a method that was quite popular in years past called Latin Hypercube sampling that might be of interest to you. Suppose you want to generate 100 random values from a distribution. Instead of using the pre-defined probabilities from the the index of a do-loop, generate a probability from each of the 100 equal-width probability intervals that cover the range 0-1. So, generate a probability on the intervals 0-.01, .01-02, ... , .99-1.0. Use those probabilities with the inverse transform on the cumulative density.
Your first two methods, using the RAND function with the GUMBEL distribution and using RAND with the UNIFORM distribution and using an inverse transform on the cumulative density function for the Gumbel, should return results that are very close - if not identical.
The third method, generating observations from the Gumbel using a pre-defined set of probabilities (the values from your DO loop index) will not be considered by most to be a random sample from the Gumbel. That method will give you a very nicely shaped Gumbel curve, but there is no randomness in the selection of your probabilities.
There is a method that was quite popular in years past called Latin Hypercube sampling that might be of interest to you. Suppose you want to generate 100 random values from a distribution. Instead of using the pre-defined probabilities from the the index of a do-loop, generate a probability from each of the 100 equal-width probability intervals that cover the range 0-1. So, generate a probability on the intervals 0-.01, .01-02, ... , .99-1.0. Use those probabilities with the inverse transform on the cumulative density.
Conover's Technometrics article from 1979 or so introduced the idea of Latin Hypercube Sampling.
Your first approach generates random values by using the inverse CDF method: https://blogs.sas.com/content/iml/2013/07/22/the-inverse-cdf-method.html
Your second method does not generate any random values. You are simply generating the quantiles of the distribution (assuming 0 < step < 1).
By definition, the quantiles are exact for the second method at the locations that you specify. The quantile ESTIMATES for the first method will be close to the exact values because you are using a large sample size (5M obs), by the Law of Large Numbers.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.