I am experiencing a similar problem discussed in the following post: https://communities.sas.com/t5/SAS-GRAPH-and-ODS-Graphics/proc-genmod-graphics-for-count-data-model-fit-assessment/m-p/353052/highlight/true#M12249 I am attempting to use graphics to illustrate the fit of both a Poisson model and negative binomial model to my data prior to model building. The data I have is count data with an offset examining incidence risk. The outcome is the number of new infections and the offset is the natural log of the number of subjects susceptible. According to the fit statistics ( Deviance/Pearson Chi-Square) a negative binomial distribution has a better fit. However, when I try to fit expected distribution curves to my data (PDF) or to create CDF curves, negative binomial clearly does not fit my data. I believe my problem is in the parameter estimation for my curves and not actually the lack-of-fit. When I use my code without an offset, both graphs are produced "correctly". When I include the offset, then the negative binomial graph does not fit. Can anyone provide me with assistance or suggestions in parameter estimation? Below is the code (including the offset) I have utilized from various resources online and within the SAS community. I did not include the code without the offset, for the sake of space, but can provide if desired. The code is identical without the offset included within my Proc Genmod commands. I have also included my data. /* Tabulate counts and plot data */ proc freq data=herdsummary; tables HerdConv / out=FreqOut plots=FreqPlot(scale=percent); run; proc univariate data=herdsummary; var herdconv; cdfplot herdconv / vscale=proportion odstitle="Empirical CDF" odstitle2="PROC UNIVARIATE"; ods output cdfplot=outCDF; /* data set contains ECDF values */ run; **********************CDF- OFFSET************************; proc genmod data=herdsummary; TITLE "Poisson With Offset"; model herdconv= /dist=Poisson offset=logherdretest dscale; /* No variables are specified, only mean is estimated. */ output out=PoissonFitOS p=lambda; run; /* Save Poisson parameter lambda in macro variables. */ data _null_; set PoissonFitOS(obs=1); call symputx('lambda', lambda); run; /* Use Min/Max values and the fitted Lambda to create theoretical Poisson Data. */ data TheoreticalPoissonOS; do HerdConv= 0 to 15; po=cdf('Poisson', HerdConv, &lambda); output; end; run; proc genmod data=herdsummary; TITLE "Negative Binomial with Offset"; model HerdConv= /dist=NegBin offset=logherdretest; /* No variables are specified, only mean is estimated */ output out= herdsummaryNB p=NBPred; ods output parameterestimates=NegBinParametersOS; run; /* Transpose Data. */ proc transpose data=NegBinParametersOS out=NegBinParametersOS; var estimate; id parameter; run; /* Calculate k and p from intercept and dispersion parameters. */ data NegBinParametersOS; set NegBinParametersOS; k = 1/(dispersion); p = 1/(1+exp(intercept)*dispersion); run; /* Save k and p in macro variables. */ data _null_; set NegBinParametersOS; call symputx('k', k); call symputx('p', p); run; /* Calculate theoretical Negative Binomial PMF based on fitted k and p. */ data TheoreticalNegBinOS; do HerdConv=0 to 15; NegBin=cdf('NegBinomial', HerdConv, &p, &k); output; end; run; data testOS; Merge theoreticalpoissonOS TheoreticalNegBinOS outcdf; Run; proc sgplot data=testOS; TITLE " Cumulative with Offset"; step x=ECDFX y=ECDFY / lineattrs=(pattern=1 thickness=2)legendlabel="ECDF"; step x=HerdConv y=PO /lineattrs=(pattern=15 thickness=2) legendlabel="Poisson Model Fit"; Step x=Herdconv y=NegBin/lineattrs=(pattern=2 thickness=2) legendlabel="NegBin Model Fit"; xaxis grid label="x" offsetmin=0.05 offsetmax=0.05; yaxis grid min=0 label="Cumulative Proportion"; run; ***********; /* Fit Poisson distribution to data. */ proc genmod data=herdsummary; TITLE "Poisson with Offset"; model herdconv= /dist=Poisson offset=logherdretest; /* No variables are specified, only mean is estimated. */ output out=PoissonFitO p=lambda; run; /* Save Poisson parameter lambda in macro variables. */ data _null_; set PoissonFitO(obs=1); call symputx('lambda', lambda); run; /* Use Min/Max values and the fitted Lambda to create theoretical Poisson Data. */ data TheoreticalPoissonO; do HerdConv= 0 to 15; po=pdf('Poisson', HerdConv, &lambda); output; end; run; /* Fit Negative Binomial distribution to data. */ proc genmod data=herdsummary; TITLE "Negative Bin with Offset"; model herdconv= /dist=NegBin offset=logherdretest; /* No variables are specified, only mean is estimated */ ods output parameterestimates=NegBinParametersO; run; /* Transpose Data. */ proc transpose data=NegBinParametersO out=NegBinParametersO; var estimate; id parameter; run; /* Calculate k and p from intercept and dispersion parameters. */ data NegBinParametersO; set NegBinParametersO; k = 1/dispersion; p = 1/(1+exp(intercept)*dispersion); run; /* Save k and p in macro variables. */ data _null_; set NegBinParametersO; call symputx('k', k); call symputx('p', p); run; /* Calculate theoretical Negative Binomial PMF based on fitted k and p. */ data TheoreticalNegBinO; do herdconv=0 to 15; NegBin=pdf('NegBinomial', herdconv, &p, &k); output; end; run; /* Merge The datasets for plotting. */ data PlotDataO(keep=herdconv freq po negbin); merge TheoreticalPoissonO TheoreticalNegBinO FreqOut; by herdconv; freq = PERCENT/100; run; /* Overlay fitted densities with original data. */ title 'Count data overlaid with fitted distributions-WITH OFFSET'; proc sgplot data=PlotDataO noautolegend; vbarparm category=herdconv response=freq / legendlabel='Count Data'; series x=herdconv y=po / markers markerattrs=(symbol=circlefilled color=red) lineattrs=(color=red)legendlabel='Fitted Poisson PMF'; series x=herdconv y=NegBin / markers markerattrs=(symbol=squarefilled color=green) lineattrs=(color=green)legendlabel='Fitted Negative Binomial PMF'; xaxis display=(nolabel); yaxis display=(nolabel); keylegend / location=inside position=NE across=1; run; title;
I have attached graphs generated with and without the offset. As you see, the code "works/gives an expected distribution" when an offset is not included. However, when an offset IS included, there is a spike at zero not explained by the data.
... View more