Quartz | Level 8

## Simulating from a Beta distribution with specified skewnewss and kurtosis

Hi,

I am trying to generate a single column of data (length N=250,1000, 2000), from a Beta distribution with specified skewness and kurtosis (I want a range of  skewness and kurtosis: skew = 0,1,2 and  kurt = 3,5,7).  Given  values for Skew and Kurt, I am not able to back-solve for the Beta shape parameters A and B.  Is there a formula to go from Skew and Kurt to A and B?

How can I get the data I want? I found this post here which is similar to my problem:

https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/Generating-a-non-normal-distribution-with...

Should I just use the RandFleishman code and modify the FLFUNC and FLDERIV functions be the Beta function and it's first derivative? I am not sure how to go about this.

``````/* Newton's method to find roots of a function.
You must supply the FLFUNC and FLDERIV functions
that compute the function and the Jacobian matrix.
Input: x0 is the starting guess
optn[1] = max number of iterations
optn[2] = convergence criterion for || f ||
Output: x contains the approximation to the root */
start Newton(x, x0, optn);
maxIter = optn[1]; converge = optn[2];
x = x0;
f = FLFunc(x);
do iter = 1 to maxIter while(max(abs(f)) > converge);
J = FLDeriv(x);
delta = -solve(J, f);                    /* correction vector */
x = x + delta;                           /* new approximation */
f = FLFunc(x);
end;
/* return missing if no convergence */
if iter > maxIter then x = j(nrow(x0),ncol(x0),.);
finish Newton;

``````

Are there otehr modifications I need to make to the RandFleishman code?

I am new to simulating data.

1 ACCEPTED SOLUTION

Accepted Solutions
SAS Super FREQ

## Re: Simulating from a Beta distribution with specified skewnewss and kurtosis

I don't think you need a separate post. This thread is fine.

As I have already told you, it is impossible to have a probability distribution for which kurt=3 when skew>1.5. You can only use feasible pairs of (skew,kurt) values. In general, the impossible region is defined by kurt >= 1 + skew**2. However, the Fleishman family cannot model the most extreme distributions. Here is some DATA step code to get only the feasible pairs that can be fit by the Fleishman family:

``````/* create (skew,kurt) values for skew > 0 that can be fit by Fleishman family */
data FeasSkewKurt;
do skew = 0 to 2.4 by 0.2;
do kurt = -2 to 10 by 0.5;
/* keep only valid pairs */
if kurt > (-1.2264489 + 1.6410373* skew**2) then output;
end;
end;
run;
``````

The main question you need to answer is WHAT DISTRIBUTIONS do you want to simulate from? You originally said beta distributions, which are bounded. You can either choose from standard families (such as beta, gamma, lognormal,...) and try to get a wide range of (skew,kurt) values, or you can use a flexible family of distributions such as the Fleishman family or the Johnson system. Using a family such as truncated normals or a mixture of normals is going to greatly complicate your life, so I do not recommend using those families. (The problem is that it is hard to find parameter values for each (skew,kurt) pair when you use those distributions.)

The basic idea of what you are trying to do is discussed and implemented in Simulating Data with SAS (Wicklin, 2013) in Chapter 16 "Moment Matching and the Moment-Ratio Diagram." In that chapter, I used the Fleishman family, but the same ideas apply to the Johnson system. I recommend either of those families.  If you do not have access to that book, you can get the Fleishman functions for free from Appendix D, which is available at https://support.sas.com/en/books/authors/rick-wicklin.html You are comfortable using SAS/IML to simulate the data, you could then write the samples to a data set and use the simulated data anywhere in SAS.

4 REPLIES 4
SAS Super FREQ

## Re: Simulating from a Beta distribution with specified skewnewss and kurtosis

Some of the (skew, kurt) values that you mention are not reachable for a beta(a,b) distribution. Others are limiting cases for the beta but are not properly beta distributions. For example,

• The (skew,kurt)=(2,3) is an impossible combination that is not obtainable by ANY probability distribution.
• The pair (0,3) specifies the moments for the normal distribution. The normal distribution is an asymptotic limit of the beta family when a=b and a -> infinity.

After you choose feasible (skew, kurt) values, then find the (a,b) values that correspond to them by solving the nonlinear equations that relate (a,b) to (skew, kurt). You can then simulate from the beta(a,b) distribution in each case.

Quartz | Level 8

## Re: Simulating from a Beta distribution with specified skewnewss and kurtosis

Thank you, @Rick_SAS !

Yes, I suppose I have to find a nonlinear solver to get me the shape values a and b, given Skew and Kurtosis.

Not sure if I need a different post but what if I choose to go with a mixture of Normalsan like in the image below, but truncated on the left? I am going for something like below, but on a bounded interval (on one side only). I am looking to try cross different Skew and Kurtosis values : Skew in 0 to 3 and Kurt in (3,5,7)

(From Allison J. Ames, Brian C. Leventhal & Nnamdi C. Ezike (2020) Monte Carlo
Simulation in Item Response Theory Applications Using SAS, Measurement: Interdisciplinary
Research and Perspectives, 18:2, 55-74, DOI: 10.1080/15366367.2019.1689762
https://doi.org/10.1080/15366367.2019.1689762 )

I found this from your blog:

Sorry, too many questions. Any advice would help get me started. Thank you again!

SAS Super FREQ

## Re: Simulating from a Beta distribution with specified skewnewss and kurtosis

I don't think you need a separate post. This thread is fine.

As I have already told you, it is impossible to have a probability distribution for which kurt=3 when skew>1.5. You can only use feasible pairs of (skew,kurt) values. In general, the impossible region is defined by kurt >= 1 + skew**2. However, the Fleishman family cannot model the most extreme distributions. Here is some DATA step code to get only the feasible pairs that can be fit by the Fleishman family:

``````/* create (skew,kurt) values for skew > 0 that can be fit by Fleishman family */
data FeasSkewKurt;
do skew = 0 to 2.4 by 0.2;
do kurt = -2 to 10 by 0.5;
/* keep only valid pairs */
if kurt > (-1.2264489 + 1.6410373* skew**2) then output;
end;
end;
run;
``````

The main question you need to answer is WHAT DISTRIBUTIONS do you want to simulate from? You originally said beta distributions, which are bounded. You can either choose from standard families (such as beta, gamma, lognormal,...) and try to get a wide range of (skew,kurt) values, or you can use a flexible family of distributions such as the Fleishman family or the Johnson system. Using a family such as truncated normals or a mixture of normals is going to greatly complicate your life, so I do not recommend using those families. (The problem is that it is hard to find parameter values for each (skew,kurt) pair when you use those distributions.)

The basic idea of what you are trying to do is discussed and implemented in Simulating Data with SAS (Wicklin, 2013) in Chapter 16 "Moment Matching and the Moment-Ratio Diagram." In that chapter, I used the Fleishman family, but the same ideas apply to the Johnson system. I recommend either of those families.  If you do not have access to that book, you can get the Fleishman functions for free from Appendix D, which is available at https://support.sas.com/en/books/authors/rick-wicklin.html You are comfortable using SAS/IML to simulate the data, you could then write the samples to a data set and use the simulated data anywhere in SAS.

Quartz | Level 8

## Re: Simulating from a Beta distribution with specified skewnewss and kurtosis

Thank you @Rick_SAS , for your patient answer and advice! Yes, the main problem is making sure the Skew and Kurtosis match up with each other , withing a legitimate distribution.

Very kind of you to help!

From The DO Loop