BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
AndersS
Lapis Lazuli | Level 10

Hi!  I am doing some computer experiments with Gumbel random numbers.
1) According to the books and manuals:

          CALL STREAMINIT(2);             X= RAND("GUMBEL", MY, SCALE);
An alternative:   Xi= MY - SCALE*LOG( -LOG(RAND('UNIFORM')));

These two seem to produce results that are identical.
I have used both to produce a lot of random numbers. Works fine. Then I calculated the Mean, StdDev and a lot of quantiles.

 

2) An alternative, when producing many random numbers is the following:    X= MY - SCALE*LOG(-LOG(Step));
where Step is the index in a do-loop. 5Mi values. Then I calculated the Mean, StdDev and a lot of quantiles.
The values are almost the same as above, with very small differences,

Fine - OR?

When I look at the values generated using 2) in some small intervals e.g. around the median the values look very smooth and nice.
When I do the same using the values produced by 1) the values are not at all that smooth, but vary up and down.

Question: Is this described in any paper? Are the values produced in 2) also random numbers.

I would appreciate some comments and advice on this matter.

Best Regards AndersS

Anders Sköllermo (Skollermo in English)
1 ACCEPTED SOLUTION

Accepted Solutions
StatsMan
SAS Super FREQ

Your first two methods, using the RAND function with the GUMBEL distribution and using RAND with the UNIFORM distribution and using an inverse transform on the cumulative density function for the Gumbel, should return results that are very close - if not identical.

 

The third method, generating observations from the Gumbel using a pre-defined set of probabilities (the values from your DO loop index) will not be considered by most to be a random sample from the Gumbel. That method will give you a very nicely shaped Gumbel curve, but there is no randomness in the selection of your probabilities. 

 

There is a method that was quite popular in years past called Latin Hypercube sampling that might be of interest to you. Suppose you want to generate 100 random values from a distribution. Instead of using the pre-defined probabilities from the the index of a do-loop, generate a probability from each of the 100 equal-width probability intervals that cover the range 0-1. So, generate a probability on the intervals 0-.01, .01-02, ... , .99-1.0. Use those probabilities with the inverse transform on the cumulative density. 

View solution in original post

8 REPLIES 8
AndersS
Lapis Lazuli | Level 10
Hi! Many thanks! Yes I will. /Br AndersS
Anders Sköllermo (Skollermo in English)
StatsMan
SAS Super FREQ

Your first two methods, using the RAND function with the GUMBEL distribution and using RAND with the UNIFORM distribution and using an inverse transform on the cumulative density function for the Gumbel, should return results that are very close - if not identical.

 

The third method, generating observations from the Gumbel using a pre-defined set of probabilities (the values from your DO loop index) will not be considered by most to be a random sample from the Gumbel. That method will give you a very nicely shaped Gumbel curve, but there is no randomness in the selection of your probabilities. 

 

There is a method that was quite popular in years past called Latin Hypercube sampling that might be of interest to you. Suppose you want to generate 100 random values from a distribution. Instead of using the pre-defined probabilities from the the index of a do-loop, generate a probability from each of the 100 equal-width probability intervals that cover the range 0-1. So, generate a probability on the intervals 0-.01, .01-02, ... , .99-1.0. Use those probabilities with the inverse transform on the cumulative density. 

AndersS
Lapis Lazuli | Level 10
Hi! This sounds VERY interesting. Do you have a ny good paper to refer to?
MANY THANKS! /Br AndersS
Anders Sköllermo (Skollermo in English)
StatsMan
SAS Super FREQ

Conover's Technometrics article from 1979 or so introduced the idea of Latin Hypercube Sampling. 

AndersS
Lapis Lazuli | Level 10
Hi! Many thanks. I will look at it. /Br AndersS
Anders Sköllermo (Skollermo in English)
Rick_SAS
SAS Super FREQ

Your first approach generates random values by using the inverse CDF method: https://blogs.sas.com/content/iml/2013/07/22/the-inverse-cdf-method.html

 

Your second method does not generate any random values. You are simply generating the quantiles of the distribution (assuming 0 < step < 1). 

 

By definition, the quantiles are exact for the second method at the locations that you specify. The quantile ESTIMATES for the first method will be close to the exact values because you are using a large sample size (5M obs), by the Law of Large Numbers.

AndersS
Lapis Lazuli | Level 10
Hi Rick! I will read it. /Br AndersS
Anders Sköllermo (Skollermo in English)

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2417 views
  • 6 likes
  • 4 in conversation