Solved: Generating a non-normal distribution with specified skewness and kurto...

Amanda_Lemon · Posted 03-04-2018 05:15 AM

Hello,

I am trying to generate a non-normal distribution with specified skewness and kurtosis. I learned that I can use Fleishman’s method for this purpose. Meaning, I need to compute the three parameters, and then I'll be able to simulate a distribution. I searched for a SAS code that would convert skewness and kurtosis values into the three parameters and found this link.

https://support.sas.com/publishing/authors/extras/65378_Appendix_D_Functions_for_Simulating_Data_by_...

I assume that the code I need is under D2. But I can't figure out how it works... I copy-pasted the code into SAS but, as is, it doesn't work. What should I do to make it work? Is it the right code to begin with?

Thank you in advance.

proc iml;
/* Find the root of this function */
start FlFunc(x) global (g_target); 
/* g_target=(skewness, kurtosis) */
return ( Fleishman(x) - (1 // g_target[1] // g_target[2]) );
finish FlFunc;
/* derivatives of the Fleishman function */
start FlDeriv(x);
b = x[1]; c = x[2]; d = x[3];
b2 = b##2; c2 = c##2; d2 = d##2; bd = b#d;
df1db = 2#b + 6#d;
df1dc = 4#c;
df1dd = 6#b + 30#d;
df2db = 4#c#(b + 12#d);
df2dc = 2#(b2 + 24#bd + 105#d2 + 2);
df2dd = 4#c#(12#b + 105#d);
df3db = 24#(d + c2#(2#b + 28#d) + 48#d##3);
df3dc = 48#c#(1 + b2 + 28#bd + 141#d2);
df3dd = 24#(b + 28#b#c2 + 2#d#(12 + 48#bd + 141#c2 + 225#d2)+ d2#(48#b + 450#d));
J = (df1db || df1dc || df1dd) //
(df2db || df2dc || df2dd) //
(df3db || df3dc || df3dd);
return( J );
finish FlDeriv;
/* Newton's method to find roots of a function.
You must supply the functions that compute
the function and the Jacobian matrix.
Input: x0 is the starting guess
optn[1] = max number of iterations
optn[2] = convergence criterion for || f ||
Output: x contains the approximation to the root */
start Newton(x, x0, optn);
maxIter = optn[1]; converge = optn[2];
x = x0;
f = FlFunc(x);
do iter = 1 to maxIter while(max(abs(f)) > converge);
J = FlDeriv(x);
delta = -solve(J, f);                   
/* correction vector */
x = x + delta;           
/* new approximation */
f = FlFunc(x);
end;
/* return missing if no convergence */
if iter > maxIter then x = j(nrow(x0),ncol(x0),.);
finish Newton;
/*Given (skew, kurt), produce an initial guess of the Fleishman
coefficients to use for Newton's algorithm.  This guess is
produced by a polynomial regression.*/
start FleishmanIC(skew, kurt);
c = j(3,1);
c[1] = 0.95357 -0.05679*kurt + 0.03520*skew##2 + 0.00133*kurt##2; /*c1 = Quad(skew, kurt)*/
c[2] = 0.10007*skew + 0.00844*skew##3;       /*c2 = Cubic(skew)*/
c[3] = 0.30978 -0.31655*c[1];                /*c3 = Linear(c1)*/
return (c);
finish;
start FitFleishmanFromSK(skew, kurt) global (g_target);
/*1. check that (skew,kurt) is in the feasible region*/
if kurt < -1.13168+1.58837*skew##2 then return({. . . .});
/*2. Initial guess for nonlinear root finding*/
x0 = FleishmanIC(skew, kurt);
optn = {25, 1e-5};   /*maximum iterations, convergence criterion*/
/*3. Find cubic coefficients (c1, c2, c3) so that -c2+c1*Z+c2*Z##2+c3*Z##3 has target (skew,kurt).*/
g_target = skew || kurt;                  /*set global variable*/
run Newton(coef, x0, optn);               /*find (c1, c2, c3)*/
return( -coef[2] // coef );
finish;
/*test the Fleishman functions*/
c = FitFleishmanFromSK(1.15, 2);
print c[rowname=("c0":"c3")];

Rick_SAS · Posted 03-05-2018 07:38 PM

> Did you mean Section 16.8 Fleishman's Method?

Yes

> I need to generate a distribution with the specified skewness and kurtosis. How should I change the code in this case?

That's section 16.10. The program on p. 316 includes the statements:

/* 1. Create equally spaced grid in (skewness, kurtosis) space:
{0, 0.2, 0.4,..., 2.4} x {-2, -1.5, -1,..., 10} */
sk = Expand2DGrid( do(0,2.4,0.2), do(-2,10,0.5) );
skew = sk[,1]; kurt = sk[,2];

/* 1a. Discard invalid pairs */
idx = loc(kurt > (-1.2264489 + 1.6410373# skew##2));
sk = sk[idx, ]; /* keep these values */
skew = sk[,1]; kurt = sk[,2];

...

c = FitFleishmanFromSK(skew[i], kurt[i]); /* find Fleishman coefs */

You can use the method in that section to systematically explore (skewness, kurtosis) space.

View solution in original post

Rick_SAS · Posted 03-04-2018 07:06 PM

You didn't define (or load) the Fleishman function.

Amanda_Lemon · Posted 03-04-2018 07:56 PM

OK... so, how (and where) do I do that?
I thought it loads at the end where it says that skewness is 1.15 and kurtosis is 2, and asks to return c values.

Rick_SAS · Posted 03-04-2018 08:22 PM

The module is defined in the middle of page 358.

You can also get the complete code directly (rather than cut/paste from a PDF file) from the book's web page. Go to

https://support.sas.com/en/books/authors/rick-wicklin.html

and click on "Example Code and Data."

Amanda_Lemon · Posted 03-04-2018 08:56 PM

Thank you for the code! That helps!

I will risk looking stupid but I still don't understand where in the code I need to put my numbers for skewness and kurtosis so that the c coefficients for these values will be displayed...

Rick_SAS · Posted 03-04-2018 09:22 PM

There is a complete worked example in section 16.6 of Simulating Data with SAS, p 311-314. Also see Section 16.10, which describes how to perform a systematic simulation study by varying skewness and kurtosis over a grid of values.

Amanda_Lemon · Posted 03-05-2018 06:39 PM

Thank you! Did you mean Section 16.8 Fleishman's Method? I am using an online version of the book through my university's library, and it doesn't have page numbers.

I am slowly figuring things out... So, I understood that I need to save the RandFleishman.sas file, create a new sas file, and call for the RandFleishman.sas file there. I read the code that you provide in Section 16.8 in the book but, as far as I understood, it fits the model to the data. But I don't have data. I just need to generate a distribution with the specified skewness and kurtosis. How should I change the code in this case?

Thank you again.

Rick_SAS · Posted 03-05-2018 07:38 PM

> Did you mean Section 16.8 Fleishman's Method?

Yes

> I need to generate a distribution with the specified skewness and kurtosis. How should I change the code in this case?

That's section 16.10. The program on p. 316 includes the statements:

/* 1. Create equally spaced grid in (skewness, kurtosis) space:
{0, 0.2, 0.4,..., 2.4} x {-2, -1.5, -1,..., 10} */
sk = Expand2DGrid( do(0,2.4,0.2), do(-2,10,0.5) );
skew = sk[,1]; kurt = sk[,2];

/* 1a. Discard invalid pairs */
idx = loc(kurt > (-1.2264489 + 1.6410373# skew##2));
sk = sk[idx, ]; /* keep these values */
skew = sk[,1]; kurt = sk[,2];

...

c = FitFleishmanFromSK(skew[i], kurt[i]); /* find Fleishman coefs */

You can use the method in that section to systematically explore (skewness, kurtosis) space.

Amanda_Lemon · Posted 03-05-2018 09:50 PM

Thank you! I think I figured out how it works. I didn't use a grid for now, and I also didn't use the RandFleishman function (it creates a matrix, and I haven't figured out yet how to use it). But I think the code below works correctly (and I am sure it looks very clumsy but I am just learning...). But I am curious about why my skewness and kurtosis values have such a big range and variance (especially kurtosis)... I thought in all samples these values will be quite close to the specified.

/* Define and store the Fleishman modules */
%include "C:\...\RandFleishman.sas";
%let NC = 500; 
%let N_samples = 20;  
%let skew = 1.5;
%let kurt = 3;
proc iml;
load module=_all_;                         /* load Fleishman modules */
c = FitFleishmanFromSK(&skew, &kurt); /* find Fleishman coefs */
create DD var {"c"}; append; close;
quit;
proc transpose data=DD
               out=FF;
run;
proc sql;
create table GG as
select 
   COL1 as c1,
   COL2 as c2, 
   COL3 as c3, 
   COL4 as c4
from FF;
quit;
data MC (keep = sampleID X);
set GG; 
call streaminit(12345678);
do sampleID = 1 to &N_samples;
  do i = 1 to &NC;
    N1 = rand("Normal");
    X = c1 + c2*N1 + c3*(N1**2) + c4*(N1**3);
    output;
  end;
end;
run;
proc univariate data=MC noprint;
  by sampleID; 
  var X; 
  output out = OUTSTAT mean = mean std = SD skewness = skew kurtosis = kurt min = min max = max; 
run;
proc print data = OUTSTAT;
run;

Rick_SAS · Posted 03-06-2018 06:03 AM

Your question is addressed throughout Chapter 16. In particular, read the sections on the moment ration diagram, and look at Fig 16..6 and 16.9. The paragraph in section 16.7 after Fig 16.10 directly answers your question.

I don't have a textbook nearby, but I think I recall that :

The standard error of the sample mean is proportional to the variance (second central moment) of the distribution.

The standard error of the sample variance is proportional to the kurtosis (fourth central moment) of the distribution.

The standard error of the sample skewness is proportional to the sixth central moment of the distribution.

The standard error of the sample kurtosis is proportional to the eighth central moment of the distribution.

In short, higher-order statistic like skewness and kurtosis will be much more variable than the lower-order statistics (mean, variance). When you simulate from a known distribution, the sample statistics for skewness and kurtosis might be quite far away from the population values (especially in small samples) whereas the sample means and .variances tend to be quite close to the population values.

Amanda_Lemon · Posted 03-10-2018 03:39 AM

That makes sense. Thank you, Rick!

Generating a non-normal distribution with specified skewness and kurtosis

Re: Generating a non-normal distribution with specified skewness and kurtosis

Re: Generating a non-normal distribution with specified skewness and kurtosis

Re: Generating a non-normal distribution with specified skewness and kurtosis

Re: Generating a non-normal distribution with specified skewness and kurtosis

Re: Generating a non-normal distribution with specified skewness and kurtosis

Re: Generating a non-normal distribution with specified skewness and kurtosis

Re: Generating a non-normal distribution with specified skewness and kurtosis

Re: Generating a non-normal distribution with specified skewness and kurtosis

Re: Generating a non-normal distribution with specified skewness and kurtosis

Re: Generating a non-normal distribution with specified skewness and kurtosis

Re: Generating a non-normal distribution with specified skewness and kurtosis

Ready to join fellow brilliant minds for the SAS Hackathon?