BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sas_user_1001
Obsidian | Level 7

I am using the rand function to pull simulated values from a distribution. Right now, I have it set as rand('Normal', Mu, Sigma); however, I was wondering if there was a way to specify a distribution and input its mean, std. dev., skew, and kurtosis. This way the distribution from which I simulate the data resembles its empirical distribution. Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Yes, there are several ways to do this. Some researchers use the moment-ratio diagram to find a distribution that is close to the (skewness, kurtosis) value, then sample from that distribution. See The moment-ratio diagram - The DO Loop (sas.com)  A chapter of the book Simulating Data with SAS shows how to implement this idea by using SAS 9.4.

 

If you have SAS Viya, you can use PROC SIMSYSTEM, which enables you to specify the moments and will output the simulated samples.

 

If you have actual data, I suggest you model the data by using the Johnson distribution, which is conceptually the easiest flexible system. You can use PROC UNIVARIATE to perform the fit. If your data are bounded (for example, tests scores that are between 0 and 100), use the Johnson SB distribution. See The Johnson SB distribution - The DO Loop (sas.com)

If your data are unbounded, use the Johnson SU system. See The Johnson SU distribution - The DO Loop (sas.com)   

(There is also an algorithm for deciding between the SB and SU family; see The Johnson system: Which distribution should you choose to model data? - The DO Loop (sas.com)After you have decided on a system and fit the parameters to the data, you can use the DATA step programs in those articles to produce random samples from the model.

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

There are no random number functions that allow you specify mean, variance, skew and kurtosis.

 

You could sample from the actual distribution to get something that resembles the distribution with whatever its mean, variance, skew and kurtosis are.

--
Paige Miller
Rick_SAS
SAS Super FREQ

Yes, there are several ways to do this. Some researchers use the moment-ratio diagram to find a distribution that is close to the (skewness, kurtosis) value, then sample from that distribution. See The moment-ratio diagram - The DO Loop (sas.com)  A chapter of the book Simulating Data with SAS shows how to implement this idea by using SAS 9.4.

 

If you have SAS Viya, you can use PROC SIMSYSTEM, which enables you to specify the moments and will output the simulated samples.

 

If you have actual data, I suggest you model the data by using the Johnson distribution, which is conceptually the easiest flexible system. You can use PROC UNIVARIATE to perform the fit. If your data are bounded (for example, tests scores that are between 0 and 100), use the Johnson SB distribution. See The Johnson SB distribution - The DO Loop (sas.com)

If your data are unbounded, use the Johnson SU system. See The Johnson SU distribution - The DO Loop (sas.com)   

(There is also an algorithm for deciding between the SB and SU family; see The Johnson system: Which distribution should you choose to model data? - The DO Loop (sas.com)After you have decided on a system and fit the parameters to the data, you can use the DATA step programs in those articles to produce random samples from the model.

sas_user_1001
Obsidian | Level 7

Wonderful--thanks for the information!

sas_user_1001
Obsidian | Level 7

As a follow-up, once you have the location, scale, and shape parameters of the distribution from using proc univariate, is there a way to assign them to a variable name in the same way we can do it with SAS summary statistics (e.g., max = name_for_max)? Or do I have to assign the value manually in the code (e.g., theta = 1.12345)? Ideally, I would like the code to read: theta = name_for_theta, and call up the variable name when needed using &name_for_theta.

Thanks for your assistance here, it has helped immensely!

Rick_SAS
SAS Super FREQ

once you have the location, scale, and shape parameters of the distribution from using proc univariate, is there a way to assign them to a variable name in the same way we can do it with SAS summary statistics (e.g., max = name_for_max)? Or do I have to assign the value manually in the code (e.g., theta = 1.12345)?

What "code" are you referring to? Show us your syntax.

The doc for the HISTOGRAM statement in PROC UNIVARIATE specifies that the syntax for the options to the parametric density keywords are of the form

THETA=value-list

SIGMA=value-list

so the option requires specifying a list of values, not a variable name.

 

You could use CALL SYMPUTX in a DATA _NULL_ step to assign values in a data set to macro variables. But that trick doesn't work if you are using BY-group proccessing.

 

sas_user_1001
Obsidian | Level 7

Sorry, I should have included the code instead of having you guess at what I'm doing...

 

proc univariate data = my_data;
   var x_var;
   histogram x_var / SB(theta = &x_var_Min_rounddn, sigma = &x_var__Max_roundup, fitmethod = moments)
   endpoints = (&x_var_Min_rounddn to &x_var_Max_roundup by 1);

output out = temp_file_1
   mean = x_var_Mean
   std = x_var_Std
   delta = dist_delta
   gamma = dist_gamma;
quit;

 

data _null_;
   set temp_file_1;
   call symput('x_var_Mean', x_var_Mean);
   call symput('x_var_Std', x_var_Std);
   call symput('dist_delta', dist_delta);
   call symput('dist_gamma', dist_gamma);

run;

 

What I was hoping to do was capture these parameter values I have bolded so I can call them up later as &dist_delta and &dist_gamma. I don't think this is possible, but you know way more than I do about SAS.

Rick_SAS
SAS Super FREQ

Yes.

Assuming the method of moments converges, PROC UNIVARIATE will create the ParameterEstimates table. You can write any SAS table to a data set by using the ODS OUTPUT statement. See ODS OUTPUT: Store any statistic created by any SAS procedure - The DO Loop

 


/* Johnson SB(threshold=theta, scale=sigma, shape=gamma, shape=delta) */
data SB(keep= X);
call streaminit(1);
do i = 1 to 1000;
   x = rand("Lognormal", 0, 0.4);
   output;
end;
run; 

 /* set bins: https://blogs.sas.com/content/iml/2014/08/25/bins-for-histograms.html */
proc univariate data=SB;
   histogram X / SB(theta=0 sigma=145 fitmethod=moments);
   ods output ParameterEstimates = PE;
   output out = temp_file_1  mean = x_var_Mean std = x_var_Std;
run;


data _null_;
   set temp_file_1;
   call symput('x_var_Mean', x_var_Mean);
   call symput('x_var_Std', x_var_Std);
run;

data _null_;
   set PE;
   if symbol='Delta' then
      call symput('dist_delta', Estimate);
   if symbol='Gamma' then
      call symput('dist_gamma', Estimate);
run;

%put &=x_var_Mean;
%put &=x_var_Std;
%put &=dist_delta;
%put &=dist_gamma;
sas_user_1001
Obsidian | Level 7
Oh, this is a wonderful solution! Thanks so much for the help!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1514 views
  • 10 likes
  • 4 in conversation