BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Amanda_Lemon
Quartz | Level 8

Hello, 

 

I am trying to generate a non-normal distribution with specified skewness and kurtosis. I learned that I can use Fleishman’s method for this purpose. Meaning, I need to compute the three parameters, and then I'll be able to simulate a distribution. I searched for a SAS code that would convert skewness and kurtosis values into the three parameters and found this link. 

 

https://support.sas.com/publishing/authors/extras/65378_Appendix_D_Functions_for_Simulating_Data_by_...

 

I assume that the code I need is under D2. But I can't figure out how it works... I copy-pasted the code into SAS but, as is, it doesn't work. What should I do to make it work? Is it the right code to begin with? 

 

Thank you in advance. 

 

proc iml;
/* Find the root of this function */
start FlFunc(x) global (g_target); 
/* g_target=(skewness, kurtosis) */
return ( Fleishman(x) - (1 // g_target[1] // g_target[2]) );
finish FlFunc;
/* derivatives of the Fleishman function */
start FlDeriv(x);
b = x[1]; c = x[2]; d = x[3];
b2 = b##2; c2 = c##2; d2 = d##2; bd = b#d;
df1db = 2#b + 6#d;
df1dc = 4#c;
df1dd = 6#b + 30#d;
df2db = 4#c#(b + 12#d);
df2dc = 2#(b2 + 24#bd + 105#d2 + 2);
df2dd = 4#c#(12#b + 105#d);
df3db = 24#(d + c2#(2#b + 28#d) + 48#d##3);
df3dc = 48#c#(1 + b2 + 28#bd + 141#d2);
df3dd = 24#(b + 28#b#c2 + 2#d#(12 + 48#bd + 141#c2 + 225#d2)+ d2#(48#b + 450#d));
J = (df1db || df1dc || df1dd) //
(df2db || df2dc || df2dd) //
(df3db || df3dc || df3dd);
return( J );
finish FlDeriv;
/* Newton's method to find roots of a function.
You must supply the functions that compute
the function and the Jacobian matrix.
Input: x0 is the starting guess
optn[1] = max number of iterations
optn[2] = convergence criterion for || f ||
Output: x contains the approximation to the root */
start Newton(x, x0, optn);
maxIter = optn[1]; converge = optn[2];
x = x0;
f = FlFunc(x);
do iter = 1 to maxIter while(max(abs(f)) > converge);
J = FlDeriv(x);
delta = -solve(J, f);                   
/* correction vector */
x = x + delta;           
/* new approximation */
f = FlFunc(x);
end;
/* return missing if no convergence */
if iter > maxIter then x = j(nrow(x0),ncol(x0),.);
finish Newton;
/*Given (skew, kurt), produce an initial guess of the Fleishman
coefficients to use for Newton's algorithm.  This guess is
produced by a polynomial regression.*/
start FleishmanIC(skew, kurt);
c = j(3,1);
c[1] = 0.95357 -0.05679*kurt + 0.03520*skew##2 + 0.00133*kurt##2; /*c1 = Quad(skew, kurt)*/
c[2] = 0.10007*skew + 0.00844*skew##3;       /*c2 = Cubic(skew)*/
c[3] = 0.30978 -0.31655*c[1];                /*c3 = Linear(c1)*/
return (c);
finish;
start FitFleishmanFromSK(skew, kurt) global (g_target);
/*1. check that (skew,kurt) is in the feasible region*/
if kurt < -1.13168+1.58837*skew##2 then return({. . . .});
/*2. Initial guess for nonlinear root finding*/
x0 = FleishmanIC(skew, kurt);
optn = {25, 1e-5};   /*maximum iterations, convergence criterion*/
/*3. Find cubic coefficients (c1, c2, c3) so that -c2+c1*Z+c2*Z##2+c3*Z##3 has target (skew,kurt).*/
g_target = skew || kurt;                  /*set global variable*/
run Newton(coef, x0, optn);               /*find (c1, c2, c3)*/
return( -coef[2] // coef );
finish;
/*test the Fleishman functions*/
c = FitFleishmanFromSK(1.15, 2);
print c[rowname=("c0":"c3")];
1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

> Did you mean Section 16.8 Fleishman's Method? 

Yes

 

> I need to generate a distribution with the specified skewness and kurtosis. How should I change the code in this case? 

That's section 16.10. The program on p. 316 includes the statements:

/* 1. Create equally spaced grid in (skewness, kurtosis) space:
{0, 0.2, 0.4,..., 2.4} x {-2, -1.5, -1,..., 10} */
sk = Expand2DGrid( do(0,2.4,0.2), do(-2,10,0.5) );
skew = sk[,1]; kurt = sk[,2];

/* 1a. Discard invalid pairs */
idx = loc(kurt > (-1.2264489 + 1.6410373# skew##2));
sk = sk[idx, ]; /* keep these values */
skew = sk[,1]; kurt = sk[,2];

...

c = FitFleishmanFromSK(skew[i], kurt[i]); /* find Fleishman coefs */

You can use the method in that section to systematically explore (skewness, kurtosis) space. 

View solution in original post

10 REPLIES 10
Rick_SAS
SAS Super FREQ

You didn't define (or load) the Fleishman function.

 

Amanda_Lemon
Quartz | Level 8
OK... so, how (and where) do I do that?
I thought it loads at the end where it says that skewness is 1.15 and kurtosis is 2, and asks to return c values.
Rick_SAS
SAS Super FREQ

The module is defined in the middle of page 358. 

 

You can also get the complete code directly (rather than cut/paste from a PDF file) from the book's web page. Go to 

https://support.sas.com/en/books/authors/rick-wicklin.html

and click on "Example Code and Data."

Amanda_Lemon
Quartz | Level 8
Thank you for the code! That helps!

I will risk looking stupid but I still don't understand where in the code I need to put my numbers for skewness and kurtosis so that the c coefficients for these values will be displayed...
Rick_SAS
SAS Super FREQ

There is a complete worked example in section 16.6 of Simulating Data with SAS, p 311-314. Also see Section 16.10, which describes how to perform a systematic simulation study by varying skewness and kurtosis over a grid of values.

Amanda_Lemon
Quartz | Level 8

Thank you! Did you mean Section 16.8 Fleishman's Method? I am using an online version of the book through my university's library, and it doesn't have page numbers. 

 

I am slowly figuring things out... So, I understood that I need to save the RandFleishman.sas file, create a new sas file, and call for the RandFleishman.sas file there. I read the code that you provide in Section 16.8 in the book but, as far as I understood, it fits the model to the data. But I don't have data. I just need to generate a distribution with the specified skewness and kurtosis. How should I change the code in this case? 

 

Thank you again. 

Rick_SAS
SAS Super FREQ

> Did you mean Section 16.8 Fleishman's Method? 

Yes

 

> I need to generate a distribution with the specified skewness and kurtosis. How should I change the code in this case? 

That's section 16.10. The program on p. 316 includes the statements:

/* 1. Create equally spaced grid in (skewness, kurtosis) space:
{0, 0.2, 0.4,..., 2.4} x {-2, -1.5, -1,..., 10} */
sk = Expand2DGrid( do(0,2.4,0.2), do(-2,10,0.5) );
skew = sk[,1]; kurt = sk[,2];

/* 1a. Discard invalid pairs */
idx = loc(kurt > (-1.2264489 + 1.6410373# skew##2));
sk = sk[idx, ]; /* keep these values */
skew = sk[,1]; kurt = sk[,2];

...

c = FitFleishmanFromSK(skew[i], kurt[i]); /* find Fleishman coefs */

You can use the method in that section to systematically explore (skewness, kurtosis) space. 

Amanda_Lemon
Quartz | Level 8

Thank you! I think I figured out how it works. I didn't use a grid for now, and I also didn't use the RandFleishman function (it creates a matrix, and I haven't figured out yet how to use it). But I think the code below works correctly (and I am sure it looks very clumsy but I am just learning...). But I am curious about why my skewness and kurtosis values have such a big range and variance (especially kurtosis)... I thought in all samples these values will be quite close to the specified. 

 

/* Define and store the Fleishman modules */
%include "C:\...\RandFleishman.sas";
%let NC = 500; 
%let N_samples = 20;  
%let skew = 1.5;
%let kurt = 3;
proc iml;
load module=_all_;                         /* load Fleishman modules */
c = FitFleishmanFromSK(&skew, &kurt); /* find Fleishman coefs */
create DD var {"c"}; append; close;
quit;
proc transpose data=DD
               out=FF;
run;
proc sql;
create table GG as
select 
   COL1 as c1,
   COL2 as c2, 
   COL3 as c3, 
   COL4 as c4
from FF;
quit;
data MC (keep = sampleID X);
set GG; 
call streaminit(12345678);
do sampleID = 1 to &N_samples;
  do i = 1 to &NC;
    N1 = rand("Normal");
    X = c1 + c2*N1 + c3*(N1**2) + c4*(N1**3);
    output;
  end;
end;
run;
proc univariate data=MC noprint;
  by sampleID; 
  var X; 
  output out = OUTSTAT mean = mean std = SD skewness = skew kurtosis = kurt min = min max = max; 
run;
proc print data = OUTSTAT;
run; 

Untitled1.jpg

 

Rick_SAS
SAS Super FREQ

Your question is addressed throughout Chapter 16. In particular, read the sections on the moment ration diagram, and look at Fig 16..6  and 16.9. The paragraph in section 16.7 after Fig 16.10 directly answers your question.

 

I don't have a textbook nearby, but I think I recall that :

The standard error of the sample mean is proportional to the variance (second central moment) of the distribution.

The standard error of the sample variance is proportional to the kurtosis (fourth central moment) of the distribution.

The standard error of the sample skewness is proportional to the sixth central moment of the distribution.

The standard error of the sample kurtosis is proportional to the eighth central moment of the distribution.

 

In short, higher-order statistic like skewness and kurtosis will be much more variable than the lower-order statistics (mean, variance). When you simulate from a known distribution, the sample statistics for skewness and kurtosis might be quite far away from the population values (especially in small samples) whereas the sample means and .variances tend to be quite close to the population values.

Amanda_Lemon
Quartz | Level 8
That makes sense. Thank you, Rick!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 10 replies
  • 2913 views
  • 0 likes
  • 2 in conversation