BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Tramp
Fluorite | Level 6

For a given dataset, I need to be able to determine the best-fitting Johnson curve (whether that be a Johnson Su, Sb or Sl distribution). Having determined the parameters for this "best" model, I then need to evaluate the inverse distribution function. Does anyone know how to do this? I am relatively new to SAS.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I suspect the JOHNSYS macro was written before SU and SB were added to UNIVARIATE. I think it uses traditional SAS/GRAPH graphics.

 

Many times we know in advance whether the distribution reflects a bounded process or not. However, you can let the data themselves suggest whether they are best fit by SU or SB. The macro makes that determination by using PROC IML, then creates a macro variable, &TYPE, which has the value '1', '2', or '3' for SU, SB, or lognormal, respectively.

 

The bulk of the macro is creating an annotation data set to overlay the curve on the histogram. You don't need that annotation overlay any more. If you know what type you are trying to fit, you can use the SU, SB, or LOGNORMAL options on the HISTOGRAM statement. The PROC will automatically overlay the curve. You can also use the CDFPLOT statement to get an ECDF of the data and the overlaid fit.

View solution in original post

6 REPLIES 6
Tramp
Fluorite | Level 6
Hi
Many thanks. When I run the macro I only get the histograms of sbdata, sudata and sldata as output: I don't seem to get the parameter estimates. What am I doing wrong?
Neil
Rick_SAS
SAS Super FREQ

Use PROC UNIVARIATE to fit Sb and Su curves. For example,

 


proc univariate data=Have;
   histogram x / Su;
run;

The output includes parameter estimates.

Tramp
Fluorite | Level 6

Thanks.

 

I don't understand though. The boiler plate in the JOHNSYS macro says:

 

NOTES: */

/* MISC: Macro JOHNSYS takes as input the data set (stored */

/* as macro variable DATA). The proper Johnson system */

/* is selected using the ratio given in Slifker and */

/* Shenton (1980). The parameter estimates for the */

/* appropriate system are then computed and displayed */

/* along with the johnson curve on a histogram. */

 

 

When I run it, all I get is histograms. I thought it was supposed to choose which curve fits best (Su, Sb or Sl) and output the parameters for this model.

 

Rick_SAS
SAS Super FREQ

I suspect the JOHNSYS macro was written before SU and SB were added to UNIVARIATE. I think it uses traditional SAS/GRAPH graphics.

 

Many times we know in advance whether the distribution reflects a bounded process or not. However, you can let the data themselves suggest whether they are best fit by SU or SB. The macro makes that determination by using PROC IML, then creates a macro variable, &TYPE, which has the value '1', '2', or '3' for SU, SB, or lognormal, respectively.

 

The bulk of the macro is creating an annotation data set to overlay the curve on the histogram. You don't need that annotation overlay any more. If you know what type you are trying to fit, you can use the SU, SB, or LOGNORMAL options on the HISTOGRAM statement. The PROC will automatically overlay the curve. You can also use the CDFPLOT statement to get an ECDF of the data and the overlaid fit.

Tramp
Fluorite | Level 6

Thank you so much. However, my SAS knowledge is not good enough to implement this. In the code below. I want to return the value of Type (or DistType) to the main macro and then use this to determine whether to perform the Su, Sb or lognormal CDFPLOT. I don't have the skills to do this. I presume I need to use symput.

 

%johnsys(dataset);

proc univariate data=dataset;

histogram x / lognormal;

run;

proc univariate data=dataset;

if "&DistType" = 1 then cdfplot x / su;

else if "&DistType" = 2 then cdfplot x / sb;

else cdfplot x / lognormal;

run;

%macro johnsys(data);

 

proc iml;

sort &data out=sorted by x;

use sorted;

read all var {x} into x;

nobs=nrow(x);

/*--- Choose z-value for percentile fit ---*/

zval = .524;

p3 = probnorm(3*zval);

 

p1 = probnorm(zval);

pm1 = probnorm(-1*zval);

pm3 = probnorm(-3*zval);

i4 = p3*nobs + .5;

i3 = p1*nobs + .5;

i2 = pm1*nobs + .5;

i1 = pm3*nobs + .5;

x3z = x[int(i4)] + mod(i4,1)*(x[int(i4)+1] - x[int(i4)]);

x1z = x[int(i3)] + mod(i3,1)*(x[int(i3)+1] - x[int(i3)]);

xm1z = x[int(i2)] + mod(i2,1)*(x[int(i2)+1] - x[int(i2)]);

xm3z = x[int(i1)] + mod(i1,1)*(x[int(i1)+1] - x[int(i1)]);

 

m = x3z - x1z;

n = xm1z - xm3z;

p = x1z - xm1z;

/*--- Ratio used to choose proper Johnson system ---*/

ratio = m*n/p**2;

tol = .05;

/*---Select appropriate Johnson Family & Estimate Parameters---*/

if (ratio > 1.0 + tol) then do;

/*--- Johnson Su Parameter Estimates ---*/

temp = .5*(m/p + n/p);

eta = 2*zval/(log(temp + sqrt(temp*temp -1 )));

temp = (n/p - m/p) / ( 2*(sqrt(m*n/(p*p) - 1)) );

gamma = eta*log(temp + sqrt(temp*temp + 1));

lambda = 2*p*sqrt(m*n/(p*p)-1)/((m/p+n/p- 2)*sqrt(m/p+n/p+2));

epsilon = (x1z + xm1z)/2 + p*(n/p - m/p) / ( 2*(m/p+n/p-2));

 

create parms var{eta gamma lambda epsilon};

append;

close parms;

type = '1';

 

end;

else if (ratio < 1.0 - tol ) then do;

/*--- Johnson Sb Parameter Estimates ---*/

temp = .5* sqrt( (1 + p/m)*(1 + p/n) );

eta = zval / log( temp + sqrt(temp*temp - 1) );

temp = (p/n-p/m)*sqrt((1+p/m)*(1+p/n)- 4)/(2*(p*p/(m*n)-1));

gamma = eta*log(temp + sqrt(temp*temp + 1) );

lambda = p*sqrt(( (1 + p/m)*(1+p/n)-2)**2 - 4)/(p*p/(m*n)-1);

epsilon = (x1z + xm1z)/2-lambda/2+p*(p/n-p/m)/(2*(p*p/(m*n)-1));

 

create parms var{eta gamma lambda epsilon};

append;

close parms;

type = '2';

 

end;

else do;

/* Johnson Sl Parameter Estimates */

eta = 2*zval / log(m/p);

gamma = eta*log( (m/p - 1) / (p*sqrt(m/p)) );

epsilon = (x1z + xm1z)/2 - (p/2)* (m/p + 1) / (m/p - 1);

/* create dataset containing parameters */

 

create parms var{gamma eta epsilon};

append;

close parms;

type = '3';

 

end;

call symput("DistType",type);

 

*quit;

%mend;

run;

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 845 views
  • 2 likes
  • 3 in conversation