BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
nstdt
Quartz | Level 8

Hi,

I am trying to generate a single column of data (length N=250,1000, 2000), from a Beta distribution with specified skewness and kurtosis (I want a range of  skewness and kurtosis: skew = 0,1,2 and  kurt = 3,5,7).  Given  values for Skew and Kurt, I am not able to back-solve for the Beta shape parameters A and B.  Is there a formula to go from Skew and Kurt to A and B?

How can I get the data I want? I found this post here which is similar to my problem:

https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/Generating-a-non-normal-distribution-with...

 

Should I just use the RandFleishman code and modify the FLFUNC and FLDERIV functions be the Beta function and it's first derivative? I am not sure how to go about this.

/* Newton's method to find roots of a function.
    You must supply the FLFUNC and FLDERIV functions
    that compute the function and the Jacobian matrix.
    Input: x0 is the starting guess
           optn[1] = max number of iterations
           optn[2] = convergence criterion for || f ||
    Output: x contains the approximation to the root */
start Newton(x, x0, optn);
   maxIter = optn[1]; converge = optn[2];
   x = x0;
   f = FLFunc(x);
   do iter = 1 to maxIter while(max(abs(f)) > converge);
      J = FLDeriv(x); 
      delta = -solve(J, f);                    /* correction vector */
      x = x + delta;                           /* new approximation */
      f = FLFunc(x);         
   end;
   /* return missing if no convergence */
   if iter > maxIter then x = j(nrow(x0),ncol(x0),.);
finish Newton;

Are there otehr modifications I need to make to the RandFleishman code?

I am new to simulating data. 

 

@Rick_SAS  or others, please help! Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I don't think you need a separate post. This thread is fine.

 

As I have already told you, it is impossible to have a probability distribution for which kurt=3 when skew>1.5. You can only use feasible pairs of (skew,kurt) values. In general, the impossible region is defined by kurt >= 1 + skew**2. However, the Fleishman family cannot model the most extreme distributions. Here is some DATA step code to get only the feasible pairs that can be fit by the Fleishman family:

/* create (skew,kurt) values for skew > 0 that can be fit by Fleishman family */
data FeasSkewKurt;
do skew = 0 to 2.4 by 0.2;
   do kurt = -2 to 10 by 0.5;
      /* keep only valid pairs */
      if kurt > (-1.2264489 + 1.6410373* skew**2) then output;
   end;
end;
run;

 

 

The main question you need to answer is WHAT DISTRIBUTIONS do you want to simulate from? You originally said beta distributions, which are bounded. You can either choose from standard families (such as beta, gamma, lognormal,...) and try to get a wide range of (skew,kurt) values, or you can use a flexible family of distributions such as the Fleishman family or the Johnson system. Using a family such as truncated normals or a mixture of normals is going to greatly complicate your life, so I do not recommend using those families. (The problem is that it is hard to find parameter values for each (skew,kurt) pair when you use those distributions.)

 

The basic idea of what you are trying to do is discussed and implemented in Simulating Data with SAS (Wicklin, 2013) in Chapter 16 "Moment Matching and the Moment-Ratio Diagram." In that chapter, I used the Fleishman family, but the same ideas apply to the Johnson system. I recommend either of those families.  If you do not have access to that book, you can get the Fleishman functions for free from Appendix D, which is available at https://support.sas.com/en/books/authors/rick-wicklin.html You are comfortable using SAS/IML to simulate the data, you could then write the samples to a data set and use the simulated data anywhere in SAS.

View solution in original post

4 REPLIES 4
Rick_SAS
SAS Super FREQ

Some of the (skew, kurt) values that you mention are not reachable for a beta(a,b) distribution. Others are limiting cases for the beta but are not properly beta distributions. For example,

  • The (skew,kurt)=(2,3) is an impossible combination that is not obtainable by ANY probability distribution.
  • The pair (0,3) specifies the moments for the normal distribution. The normal distribution is an asymptotic limit of the beta family when a=b and a -> infinity.

Please read about the moment-ratio diagram, which shows the feasible (skew,kurt) values for common families. 

 

After you choose feasible (skew, kurt) values, then find the (a,b) values that correspond to them by solving the nonlinear equations that relate (a,b) to (skew, kurt). You can then simulate from the beta(a,b) distribution in each case.

nstdt
Quartz | Level 8

Thank you, @Rick_SAS !

Yes, I suppose I have to find a nonlinear solver to get me the shape values a and b, given Skew and Kurtosis.

Not sure if I need a different post but what if I choose to go with a mixture of Normalsan like in the image below, but truncated on the left? I am going for something like below, but on a bounded interval (on one side only). I am looking to try cross different Skew and Kurtosis values : Skew in 0 to 3 and Kurt in (3,5,7)

nstdt_0-1712237267936.png

(From Allison J. Ames, Brian C. Leventhal & Nnamdi C. Ezike (2020) Monte Carlo
Simulation in Item Response Theory Applications Using SAS, Measurement: Interdisciplinary
Research and Perspectives, 18:2, 55-74, DOI: 10.1080/15366367.2019.1689762
 https://doi.org/10.1080/15366367.2019.1689762 )

 

I found this from your blog: 

https://blogs.sas.com/content/iml/2019/04/29/normal-mixture-distribution-sas.html and Implement the truncated normal distribution in SAS - The DO Loop ?

Sorry, too many questions. Any advice would help get me started. Thank you again!

Rick_SAS
SAS Super FREQ

I don't think you need a separate post. This thread is fine.

 

As I have already told you, it is impossible to have a probability distribution for which kurt=3 when skew>1.5. You can only use feasible pairs of (skew,kurt) values. In general, the impossible region is defined by kurt >= 1 + skew**2. However, the Fleishman family cannot model the most extreme distributions. Here is some DATA step code to get only the feasible pairs that can be fit by the Fleishman family:

/* create (skew,kurt) values for skew > 0 that can be fit by Fleishman family */
data FeasSkewKurt;
do skew = 0 to 2.4 by 0.2;
   do kurt = -2 to 10 by 0.5;
      /* keep only valid pairs */
      if kurt > (-1.2264489 + 1.6410373* skew**2) then output;
   end;
end;
run;

 

 

The main question you need to answer is WHAT DISTRIBUTIONS do you want to simulate from? You originally said beta distributions, which are bounded. You can either choose from standard families (such as beta, gamma, lognormal,...) and try to get a wide range of (skew,kurt) values, or you can use a flexible family of distributions such as the Fleishman family or the Johnson system. Using a family such as truncated normals or a mixture of normals is going to greatly complicate your life, so I do not recommend using those families. (The problem is that it is hard to find parameter values for each (skew,kurt) pair when you use those distributions.)

 

The basic idea of what you are trying to do is discussed and implemented in Simulating Data with SAS (Wicklin, 2013) in Chapter 16 "Moment Matching and the Moment-Ratio Diagram." In that chapter, I used the Fleishman family, but the same ideas apply to the Johnson system. I recommend either of those families.  If you do not have access to that book, you can get the Fleishman functions for free from Appendix D, which is available at https://support.sas.com/en/books/authors/rick-wicklin.html You are comfortable using SAS/IML to simulate the data, you could then write the samples to a data set and use the simulated data anywhere in SAS.

nstdt
Quartz | Level 8

Thank you @Rick_SAS , for your patient answer and advice! Yes, the main problem is making sure the Skew and Kurtosis match up with each other , withing a legitimate distribution. 

Very kind of you to help!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 4 replies
  • 447 views
  • 7 likes
  • 2 in conversation