Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 03-04-2018 05:15 AM
(3574 views)

Hello,

I am trying to generate a non-normal distribution with specified skewness and kurtosis. I learned that I can use Fleishman’s method for this purpose. Meaning, I need to compute the three parameters, and then I'll be able to simulate a distribution. I searched for a SAS code that would convert skewness and kurtosis values into the three parameters and found this link.

I assume that the code I need is under D2. But I can't figure out how it works... I copy-pasted the code into SAS but, as is, it doesn't work. What should I do to make it work? Is it the right code to begin with?

Thank you in advance.

```
proc iml;
/* Find the root of this function */
start FlFunc(x) global (g_target);
/* g_target=(skewness, kurtosis) */
return ( Fleishman(x) - (1 // g_target[1] // g_target[2]) );
finish FlFunc;
/* derivatives of the Fleishman function */
start FlDeriv(x);
b = x[1]; c = x[2]; d = x[3];
b2 = b##2; c2 = c##2; d2 = d##2; bd = b#d;
df1db = 2#b + 6#d;
df1dc = 4#c;
df1dd = 6#b + 30#d;
df2db = 4#c#(b + 12#d);
df2dc = 2#(b2 + 24#bd + 105#d2 + 2);
df2dd = 4#c#(12#b + 105#d);
df3db = 24#(d + c2#(2#b + 28#d) + 48#d##3);
df3dc = 48#c#(1 + b2 + 28#bd + 141#d2);
df3dd = 24#(b + 28#b#c2 + 2#d#(12 + 48#bd + 141#c2 + 225#d2)+ d2#(48#b + 450#d));
J = (df1db || df1dc || df1dd) //
(df2db || df2dc || df2dd) //
(df3db || df3dc || df3dd);
return( J );
finish FlDeriv;
/* Newton's method to find roots of a function.
You must supply the functions that compute
the function and the Jacobian matrix.
Input: x0 is the starting guess
optn[1] = max number of iterations
optn[2] = convergence criterion for || f ||
Output: x contains the approximation to the root */
start Newton(x, x0, optn);
maxIter = optn[1]; converge = optn[2];
x = x0;
f = FlFunc(x);
do iter = 1 to maxIter while(max(abs(f)) > converge);
J = FlDeriv(x);
delta = -solve(J, f);
/* correction vector */
x = x + delta;
/* new approximation */
f = FlFunc(x);
end;
/* return missing if no convergence */
if iter > maxIter then x = j(nrow(x0),ncol(x0),.);
finish Newton;
/*Given (skew, kurt), produce an initial guess of the Fleishman
coefficients to use for Newton's algorithm. This guess is
produced by a polynomial regression.*/
start FleishmanIC(skew, kurt);
c = j(3,1);
c[1] = 0.95357 -0.05679*kurt + 0.03520*skew##2 + 0.00133*kurt##2; /*c1 = Quad(skew, kurt)*/
c[2] = 0.10007*skew + 0.00844*skew##3; /*c2 = Cubic(skew)*/
c[3] = 0.30978 -0.31655*c[1]; /*c3 = Linear(c1)*/
return (c);
finish;
start FitFleishmanFromSK(skew, kurt) global (g_target);
/*1. check that (skew,kurt) is in the feasible region*/
if kurt < -1.13168+1.58837*skew##2 then return({. . . .});
/*2. Initial guess for nonlinear root finding*/
x0 = FleishmanIC(skew, kurt);
optn = {25, 1e-5}; /*maximum iterations, convergence criterion*/
/*3. Find cubic coefficients (c1, c2, c3) so that -c2+c1*Z+c2*Z##2+c3*Z##3 has target (skew,kurt).*/
g_target = skew || kurt; /*set global variable*/
run Newton(coef, x0, optn); /*find (c1, c2, c3)*/
return( -coef[2] // coef );
finish;
/*test the Fleishman functions*/
c = FitFleishmanFromSK(1.15, 2);
print c[rowname=("c0":"c3")];
```

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

*> Did you mean Section 16.8 Fleishman's Method? *

Yes

*> I need to generate a distribution with the specified skewness and kurtosis. How should I change the code in this case? *

That's section 16.10. The program on p. 316 includes the statements:

```
/* 1. Create equally spaced grid in (skewness, kurtosis) space:
{0, 0.2, 0.4,..., 2.4} x {-2, -1.5, -1,..., 10} */
sk = Expand2DGrid( do(0,2.4,0.2), do(-2,10,0.5) );
skew = sk[,1]; kurt = sk[,2];
/* 1a. Discard invalid pairs */
idx = loc(kurt > (-1.2264489 + 1.6410373# skew##2));
sk = sk[idx, ]; /* keep these values */
skew = sk[,1]; kurt = sk[,2];
...
c = FitFleishmanFromSK(skew[i], kurt[i]); /* find Fleishman coefs */
```

You can use the method in that section to systematically explore (skewness, kurtosis) space.

10 REPLIES 10

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You didn't define (or load) the Fleishman function.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

OK... so, how (and where) do I do that?

I thought it loads at the end where it says that skewness is 1.15 and kurtosis is 2, and asks to return c values.

I thought it loads at the end where it says that skewness is 1.15 and kurtosis is 2, and asks to return c values.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The module is defined in the middle of page 358.

You can also get the complete code directly (rather than cut/paste from a PDF file) from the book's web page. Go to

https://support.sas.com/en/books/authors/rick-wicklin.html

and click on "Example Code and Data."

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for the code! That helps!

I will risk looking stupid but I still don't understand where in the code I need to put my numbers for skewness and kurtosis so that the c coefficients for these values will be displayed...

I will risk looking stupid but I still don't understand where in the code I need to put my numbers for skewness and kurtosis so that the c coefficients for these values will be displayed...

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

*Simulating Data with SAS, *p 311-314. Also see Section 16.10, which describes how to perform a systematic simulation study by varying skewness and kurtosis over a grid of values.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you! Did you mean Section 16.8 Fleishman's Method? I am using an online version of the book through my university's library, and it doesn't have page numbers.

I am slowly figuring things out... So, I understood that I need to save the RandFleishman.sas file, create a new sas file, and call for the RandFleishman.sas file there. I read the code that you provide in Section 16.8 in the book but, as far as I understood, it fits the model to the data. But I don't have data. I just need to generate a distribution with the specified skewness and kurtosis. How should I change the code in this case?

Thank you again.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

*> Did you mean Section 16.8 Fleishman's Method? *

Yes

*> I need to generate a distribution with the specified skewness and kurtosis. How should I change the code in this case? *

That's section 16.10. The program on p. 316 includes the statements:

```
/* 1. Create equally spaced grid in (skewness, kurtosis) space:
{0, 0.2, 0.4,..., 2.4} x {-2, -1.5, -1,..., 10} */
sk = Expand2DGrid( do(0,2.4,0.2), do(-2,10,0.5) );
skew = sk[,1]; kurt = sk[,2];
/* 1a. Discard invalid pairs */
idx = loc(kurt > (-1.2264489 + 1.6410373# skew##2));
sk = sk[idx, ]; /* keep these values */
skew = sk[,1]; kurt = sk[,2];
...
c = FitFleishmanFromSK(skew[i], kurt[i]); /* find Fleishman coefs */
```

You can use the method in that section to systematically explore (skewness, kurtosis) space.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you! I think I figured out how it works. I didn't use a grid for now, and I also didn't use the RandFleishman function (it creates a matrix, and I haven't figured out yet how to use it). But I think the code below works correctly (and I am sure it looks very clumsy but I am just learning...). But I am curious about why my skewness and kurtosis values have such a big range and variance (especially kurtosis)... I thought in all samples these values will be quite close to the specified.

```
/* Define and store the Fleishman modules */
%include "C:\...\RandFleishman.sas";
%let NC = 500;
%let N_samples = 20;
%let skew = 1.5;
%let kurt = 3;
proc iml;
load module=_all_; /* load Fleishman modules */
c = FitFleishmanFromSK(&skew, &kurt); /* find Fleishman coefs */
create DD var {"c"}; append; close;
quit;
proc transpose data=DD
out=FF;
run;
proc sql;
create table GG as
select
COL1 as c1,
COL2 as c2,
COL3 as c3,
COL4 as c4
from FF;
quit;
data MC (keep = sampleID X);
set GG;
call streaminit(12345678);
do sampleID = 1 to &N_samples;
do i = 1 to &NC;
N1 = rand("Normal");
X = c1 + c2*N1 + c3*(N1**2) + c4*(N1**3);
output;
end;
end;
run;
proc univariate data=MC noprint;
by sampleID;
var X;
output out = OUTSTAT mean = mean std = SD skewness = skew kurtosis = kurt min = min max = max;
run;
proc print data = OUTSTAT;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Your question is addressed throughout Chapter 16. In particular, read the sections on the moment ration diagram, and look at Fig 16..6 and 16.9. The paragraph in section 16.7 after Fig 16.10 directly answers your question.

I don't have a textbook nearby, but I think I recall that :

The standard error of the sample mean is proportional to the variance (second central moment) of the distribution.

The standard error of the sample variance is proportional to the kurtosis (fourth central moment) of the distribution.

The standard error of the sample skewness is proportional to the sixth central moment of the distribution.

The standard error of the sample kurtosis is proportional to the eighth central moment of the distribution.

In short, higher-order statistic like skewness and kurtosis will be much more variable than the lower-order statistics (mean, variance). When you simulate from a known distribution, the sample statistics for skewness and kurtosis might be quite far away from the population values (especially in small samples) whereas the sample means and .variances tend to be quite close to the population values.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

That makes sense. Thank you, Rick!

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.