Data visualization with SAS programming

proc genmod graphics for count data model fit assessment

Accepted Solution Solved
Reply
Contributor
Posts: 49
Accepted Solution

proc genmod graphics for count data model fit assessment

[ Edited ]

I'd like to include before and after model fit (proc genmod using negbin or poisson) visuals in my poster using clean data (unique patients) vs duplicate data (patients recounted). The idea is to show the impact of deduplication, if any. Global stat shows negbin fits my data better but I'd like to present it by visuals. (I'm using 9.4)

 

The ideal plot would be one shown in SUGI reference attached here. Below is just a screenshot of Figure 4 in the reference which is a nice cum probability graph comparing negbin, poisson and observed. But paper didn't cover how to. Anybody knows the way to reproduce this kinda plot?

want image.png

- I read about ods graphs options for proc genmod and tried "assess var" option as shown below. It was resulted in no ouputs. Any idea why?

 

 

ods graphics on;
proc genmod data=mydata;
   class exposure(ref="0")/param=ref;
   model outcome=exposure/ dist=negbin link=log offset=ln;
   assess var=(outcome)/resample=10000
                        seed=603708000
                        crpanel;
   ods trace on;
   run;

https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_genmod_sect...

 

Below is the first 10 obs of my data where outcome is counts (number of cancers) , zone is exposure (taking value of 1,2 and 0), grand_total is the ZIP population/denominator.

 

 

mydata.png

 

I appreciate your precious time!!!! thanks in advance.

Attachment

Accepted Solutions
Solution
‎04-25-2017 03:10 PM
SAS Super FREQ
Posts: 3,236

Re: proc genmod graphics for count data model fit assessment

You can use PROC UNIVARIATE to obtain the empirical cumulative probabilities, as shown in 

http://blogs.sas.com/content/iml/2016/09/06/graph-step-function-sas.html

You can then use the STEP or SERIES statement in PROC SGPLOT to graph it, as in this example:

 

data MyData;
input x @@;
datalines;
   7 7 13 9 8 8 9 9 5 6 6 9 5 10 4 5 3 8 4
;

/* http://blogs.sas.com/content/iml/2016/09/06/graph-step-function-sas.html */
ods select cdfplot;
proc univariate data=MyData;
cdfplot x / vscale=proportion 
         odstitle="Empirical CDF" odstitle2="PROC UNIVARIATE";
ods output cdfplot=outCDF;   /* data set contains ECDF values */
run;
 
title "Empirical CDF";
proc sgplot data=outCDF noautolegend;
   step x=ECDFX y=ECDFY;          /* variable names created by PROC UNIVARIATE */
   xaxis grid label="x" offsetmin=0.05 offsetmax=0.05;
   yaxis grid min=0 label="Cumulative Proportion";
run;

To overlay additional curves, take the parameter estimates from PROC GENMOD and use the CDF function in a DATA step to compute the predicted CDFs for the Poisson (and NB) distribution. You can either evaluate the CDF at the data, or you can evaluate the CDF on a grid of points, as shown in 

http://blogs.sas.com/content/iml/2012/04/04/fitting-a-poisson-distribution-to-data-in-sas.html

(If uniform grid, you need to merge the data and the CDF.)

Here is an example where GENMOD gave Lambda=7.1 for the Poisson fit and the Poisson CDF is evaluated at the data:

 

%let Lambda = 7.1;  /* param estimate from GENMOD fit */
data All;
set outCDF;
PoisCDF = cdf("Poisson", ECDFX, &Lambda);
run;
proc sgplot data=All;
   step x=ECDFX y=ECDFY / legendlabel="ECDF";
   step x=ECDFX y=PoisCDF / legendlabel="Model Fit";
   xaxis grid label="x" offsetmin=0.05 offsetmax=0.05;
   yaxis grid min=0 label="Cumulative Proportion";
run;

 

 

 

View solution in original post


All Replies
Grand Advisor
Posts: 16,416

Re: proc genmod graphics for count data model fit assessment

Are you asking how to get the data for the graph or how to graph the data or both?

 

If graphing, you can use SGPLOT with STEP and SERIES statement to get the graph shown, but you do need the data first Smiley Happy

Contributor
Posts: 49

Re: proc genmod graphics for count data model fit assessment

I have my data ready. Both of them. One clean and the other has duplicates. Could you please show me a syntax example?
Contributor
Posts: 49

Re: proc genmod graphics for count data model fit assessment

I think I just got what you're asking. Are you saying that I have to prep the data first for sgplot? If so, No. No data prepped for sgplot yet. Do you know how?
Grand Advisor
Posts: 16,416

Re: proc genmod graphics for count data model fit assessment

No, I don't know how to get the estimates. That should be your first question Smiley Happy

Contributor
Posts: 49

Re: proc genmod graphics for count data model fit assessment

I have my estimates calculated both for negbin and poisson. I'm looking out for hints as to how my estimates form a data to feed proc sgplot. Please don't hesitate to share any links of resources with me.
Grand Advisor
Posts: 16,416

Re: proc genmod graphics for count data model fit assessment

It's pretty straightforward...

 

You should have estimates with the count and probability, ie the data you want on the charts. 

Then use SGPLOT, if you can post we can run/reporduce your results then I can help you there but otherwise you'll have to wait for someone else.

 

Check out robslink.com for examples though be careful to find SG procedures. You can review the SG documentation for examples. 

 

Contributor
Posts: 49

Re: proc genmod graphics for count data model fit assessment

"if you can post?" what are you referring to? my estimates? I can post them for help. I am baffled where "cum probability" on graph is based on?
Grand Advisor
Posts: 16,416

Re: proc genmod graphics for count data model fit assessment

Sorry, that should be if you can post the data. 

 

If you don't know what the graph is why are you creating it?

Contributor
Posts: 49

Re: proc genmod graphics for count data model fit assessment

I can post the data. Please see the attachment. Why I want to create it? I was looking for the most appropriate SAS visual outputs to illustrate model fit using proc genmod with negbin/ poisson distribution. And I stumbled across the reference attached here. I understand the concept of the graph but asking for help from you guys as to how cumulative probability on the shown plot was pulled out.
Contributor
Posts: 49

Re: proc genmod graphics for count data model fit assessment

@Reeza , see attached data. I will check in after few hours. I am eastern timezone.
Solution
‎04-25-2017 03:10 PM
SAS Super FREQ
Posts: 3,236

Re: proc genmod graphics for count data model fit assessment

You can use PROC UNIVARIATE to obtain the empirical cumulative probabilities, as shown in 

http://blogs.sas.com/content/iml/2016/09/06/graph-step-function-sas.html

You can then use the STEP or SERIES statement in PROC SGPLOT to graph it, as in this example:

 

data MyData;
input x @@;
datalines;
   7 7 13 9 8 8 9 9 5 6 6 9 5 10 4 5 3 8 4
;

/* http://blogs.sas.com/content/iml/2016/09/06/graph-step-function-sas.html */
ods select cdfplot;
proc univariate data=MyData;
cdfplot x / vscale=proportion 
         odstitle="Empirical CDF" odstitle2="PROC UNIVARIATE";
ods output cdfplot=outCDF;   /* data set contains ECDF values */
run;
 
title "Empirical CDF";
proc sgplot data=outCDF noautolegend;
   step x=ECDFX y=ECDFY;          /* variable names created by PROC UNIVARIATE */
   xaxis grid label="x" offsetmin=0.05 offsetmax=0.05;
   yaxis grid min=0 label="Cumulative Proportion";
run;

To overlay additional curves, take the parameter estimates from PROC GENMOD and use the CDF function in a DATA step to compute the predicted CDFs for the Poisson (and NB) distribution. You can either evaluate the CDF at the data, or you can evaluate the CDF on a grid of points, as shown in 

http://blogs.sas.com/content/iml/2012/04/04/fitting-a-poisson-distribution-to-data-in-sas.html

(If uniform grid, you need to merge the data and the CDF.)

Here is an example where GENMOD gave Lambda=7.1 for the Poisson fit and the Poisson CDF is evaluated at the data:

 

%let Lambda = 7.1;  /* param estimate from GENMOD fit */
data All;
set outCDF;
PoisCDF = cdf("Poisson", ECDFX, &Lambda);
run;
proc sgplot data=All;
   step x=ECDFX y=ECDFY / legendlabel="ECDF";
   step x=ECDFX y=PoisCDF / legendlabel="Model Fit";
   xaxis grid label="x" offsetmin=0.05 offsetmax=0.05;
   yaxis grid min=0 label="Cumulative Proportion";
run;

 

 

 

Contributor
Posts: 49

Re: proc genmod graphics for count data model fit assessment

@Rick_SASthanks.

I have updated your code to my data. Bummer is, cdfplot is red and I get a error log below?

ods select cdfplot;
proc univariate data=post.zipcrude5;
cdfplot rate / vscale=proportion odstitle="Empirical CDF" odstitle2="PROC UNIVARIATE";
ods output cdfplot=post.outCDF; 
run;

ERROR 22-322: Syntax error, expecting one of the following: ;, (, ALL, ALPHA, ANNOTATE, CIBASIC,
              CIPCTLDF, CIPCTLNORMAL, CIQUANTDF, CIQUANTNORMAL, DATA, DEBUG, EXCLNPWGT, FREQ,
              GOUT, LOCCOUNT, MODE, MODES, MU0, NEXTROBS, NEXTRVAL, NOBYPLOT, NOPRINT, NORMAL,
              NOTABCONTENTS, NOVARCONTENTS, OUTTABLE, PCTLDEF, PLOT, PLOTSIZE, ROBUSTSCALE,
              ROUND, SUMMARYCONTENTS, TRIMMED, VARDEF, WINSORIZED.
ERROR 76-322: Syntax error, statement will be ignored.
4989           odstitle="Empirical CDF" odstitle2="PROC UNIVARIATE";
4990  ods output cdfplot=post.outCDF;
4991  run;

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE UNIVARIATE used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

WARNING: Output 'cdfplot' was not created.  Make sure that the output object name, label, or path
         is spelled correctly.  Also, verify that the appropriate procedure options are used to
         produce the requested output object.  For example, verify that the NOPRINT option is not
         used.

SAS Super FREQ
Posts: 3,236

Re: proc genmod graphics for count data model fit assessment

Don't worry about the red color. It just means that the syntax highlighter doesn't know that statement. It has been in PROC UNIVARIATE since forever. 

 

The ERROR is on the PROC UNIVARIATE statement, so I suspect you had a copy-paste error that you corrected in the code you pasted. Try it again.  If you still get an error, try it on data we all have access to:

 

ods select cdfplot;
proc univariate data=sashelp.cars;
cdfplot mpg_city / vscale=proportion odstitle="Empirical CDF" odstitle2="PROC UNIVARIATE";
ods output cdfplot=outCDF; 
run;
Contributor
Posts: 49

Re: proc genmod graphics for count data model fit assessment

@Rick_SAS,

Thanks for previous comments. All worked out.

 

1. I tried following but NegBCDF column is all missing in the outCDFcar.

Lambda1 is the exp(estimate) from "ods output parameterestimates=data;" 

 

 

ods select cdfplot;
proc univariate data=sashelp.cars;
cdfplot mpg_city / vscale=proportion odstitle="Empirical CDF" odstitle2="PROC UNIVARIATE";
ods output cdfplot=outCDFcar; 
run;
%let Lambda1=1.04;
data alldup; set outCDFcar;
NegBCDF=cdf('NEGB',1, ECDFX, &Lambda1);
run;

2. Below is the desired image. I have two separate cdf data with N=1122 for clean and N=1133 for uncleaned data from proc univariate. Any idea how I can overlay them on the same plot as shown below? 

 

 

decired plot.png

Post a Question
Discussion Stats
  • 23 replies
  • 157 views
  • 6 likes
  • 3 in conversation