Solved: Fitting Johnson curves

Tramp · Posted 12-03-2019 03:47 AM

For a given dataset, I need to be able to determine the best-fitting Johnson curve (whether that be a Johnson Su, Sb or Sl distribution). Having determined the parameters for this "best" model, I then need to evaluate the inverse distribution function. Does anyone know how to do this? I am relatively new to SAS.

Rick_SAS · Posted 12-03-2019 10:59 AM

I suspect the JOHNSYS macro was written before SU and SB were added to UNIVARIATE. I think it uses traditional SAS/GRAPH graphics.

Many times we know in advance whether the distribution reflects a bounded process or not. However, you can let the data themselves suggest whether they are best fit by SU or SB. The macro makes that determination by using PROC IML, then creates a macro variable, &TYPE, which has the value '1', '2', or '3' for SU, SB, or lognormal, respectively.

The bulk of the macro is creating an annotation data set to overlay the curve on the histogram. You don't need that annotation overlay any more. If you know what type you are trying to fit, you can use the SU, SB, or LOGNORMAL options on the HISTOGRAM statement. The PROC will automatically overlay the curve. You can also use the CDFPLOT statement to get an ECDF of the data and the overlaid fit.

View solution in original post

PaigeMiller · Posted 12-03-2019 08:54 AM

https://support.sas.com/documentation/onlinedoc/qc/ex_code/132/johnsys4.html

--
Paige Miller

Tramp · Posted 12-03-2019 09:43 AM

Hi
Many thanks. When I run the macro I only get the histograms of sbdata, sudata and sldata as output: I don't seem to get the parameter estimates. What am I doing wrong?
Neil

Rick_SAS · Posted 12-03-2019 09:51 AM

Use PROC UNIVARIATE to fit Sb and Su curves. For example,


proc univariate data=Have;
   histogram x / Su;
run;

The output includes parameter estimates.

Tramp · Posted 12-03-2019 10:13 AM

Thanks.

I don't understand though. The boiler plate in the JOHNSYS macro says:

NOTES: */

/* MISC: Macro JOHNSYS takes as input the data set (stored */

/* as macro variable DATA). The proper Johnson system */

/* is selected using the ratio given in Slifker and */

/* Shenton (1980). The parameter estimates for the */

/* appropriate system are then computed and displayed */

/* along with the johnson curve on a histogram. */

When I run it, all I get is histograms. I thought it was supposed to choose which curve fits best (Su, Sb or Sl) and output the parameters for this model.

Rick_SAS · Posted 12-03-2019 10:59 AM

I suspect the JOHNSYS macro was written before SU and SB were added to UNIVARIATE. I think it uses traditional SAS/GRAPH graphics.

Many times we know in advance whether the distribution reflects a bounded process or not. However, you can let the data themselves suggest whether they are best fit by SU or SB. The macro makes that determination by using PROC IML, then creates a macro variable, &TYPE, which has the value '1', '2', or '3' for SU, SB, or lognormal, respectively.

The bulk of the macro is creating an annotation data set to overlay the curve on the histogram. You don't need that annotation overlay any more. If you know what type you are trying to fit, you can use the SU, SB, or LOGNORMAL options on the HISTOGRAM statement. The PROC will automatically overlay the curve. You can also use the CDFPLOT statement to get an ECDF of the data and the overlaid fit.

Tramp · Posted 12-04-2019 05:26 AM

Thank you so much. However, my SAS knowledge is not good enough to implement this. In the code below. I want to return the value of Type (or DistType) to the main macro and then use this to determine whether to perform the Su, Sb or lognormal CDFPLOT. I don't have the skills to do this. I presume I need to use symput.

%johnsys(dataset);

proc univariate data=dataset;

histogram x / lognormal;

run;

proc univariate data=dataset;

if "&DistType" = 1 then cdfplot x / su;

else if "&DistType" = 2 then cdfplot x / sb;

else cdfplot x / lognormal;

run;

%macro johnsys(data);

proc iml;

sort &data out=sorted by x;

use sorted;

read all var {x} into x;

nobs=nrow(x);

/*--- Choose z-value for percentile fit ---*/

zval = .524;

p3 = probnorm(3*zval);

p1 = probnorm(zval);

pm1 = probnorm(-1*zval);

pm3 = probnorm(-3*zval);

i4 = p3*nobs + .5;

i3 = p1*nobs + .5;

i2 = pm1*nobs + .5;

i1 = pm3*nobs + .5;

x3z = x[int(i4)] + mod(i4,1)*(x[int(i4)+1] - x[int(i4)]);

x1z = x[int(i3)] + mod(i3,1)*(x[int(i3)+1] - x[int(i3)]);

xm1z = x[int(i2)] + mod(i2,1)*(x[int(i2)+1] - x[int(i2)]);

xm3z = x[int(i1)] + mod(i1,1)*(x[int(i1)+1] - x[int(i1)]);

m = x3z - x1z;

n = xm1z - xm3z;

p = x1z - xm1z;

/*--- Ratio used to choose proper Johnson system ---*/

ratio = m*n/p**2;

tol = .05;

/*---Select appropriate Johnson Family & Estimate Parameters---*/

if (ratio > 1.0 + tol) then do;

/*--- Johnson Su Parameter Estimates ---*/

temp = .5*(m/p + n/p);

eta = 2*zval/(log(temp + sqrt(temp*temp -1 )));

temp = (n/p - m/p) / ( 2*(sqrt(m*n/(p*p) - 1)) );

gamma = eta*log(temp + sqrt(temp*temp + 1));

lambda = 2*p*sqrt(m*n/(p*p)-1)/((m/p+n/p- 2)*sqrt(m/p+n/p+2));

epsilon = (x1z + xm1z)/2 + p*(n/p - m/p) / ( 2*(m/p+n/p-2));

create parms var{eta gamma lambda epsilon};

append;

close parms;

type = '1';

end;

else if (ratio < 1.0 - tol ) then do;

/*--- Johnson Sb Parameter Estimates ---*/

temp = .5* sqrt( (1 + p/m)*(1 + p/n) );

eta = zval / log( temp + sqrt(temp*temp - 1) );

temp = (p/n-p/m)*sqrt((1+p/m)*(1+p/n)- 4)/(2*(p*p/(m*n)-1));

gamma = eta*log(temp + sqrt(temp*temp + 1) );

lambda = p*sqrt(( (1 + p/m)*(1+p/n)-2)**2 - 4)/(p*p/(m*n)-1);

epsilon = (x1z + xm1z)/2-lambda/2+p*(p/n-p/m)/(2*(p*p/(m*n)-1));

create parms var{eta gamma lambda epsilon};

append;

close parms;

type = '2';

end;

else do;

/* Johnson Sl Parameter Estimates */

eta = 2*zval / log(m/p);

gamma = eta*log( (m/p - 1) / (p*sqrt(m/p)) );

epsilon = (x1z + xm1z)/2 - (p/2)* (m/p + 1) / (m/p - 1);

/* create dataset containing parameters */

create parms var{gamma eta epsilon};

append;

close parms;

type = '3';

end;

call symput("DistType",type);

*quit;

%mend;

run;

Fitting Johnson curves

Re: Fitting Johnson curves

Re: Fitting Johnson curves

Re: Fitting Johnson curves

Re: Fitting Johnson curves

Re: Fitting Johnson curves

Re: Fitting Johnson curves

Re: Fitting Johnson curves

Registration is open