About HHutch

HHutch · ‎03-29-2018

You were right! The one file's date contains values after the decimal point. I didn't realize the dataset it was being pulled from used a date/time format. Thank you for your help! It is greatly appreciated!

HHutch · ‎03-29-2018

The log shows 327 DATA parameters2; 328 MERGE parameters infect.qscout; 329 BY ID Date; 330 RUN; NOTE: There were 375 observations read from the data set WORK.PARAMETERS. NOTE: There were 233 observations read from the data set INFECT.QSCOUT. NOTE: The data set WORK.PARAMETERS2 has 608 observations and 12 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds

HHutch · ‎03-28-2018

Hello, I am trying to merge three separate SAS datasets by two variables (ID Date) that are present in all three datasets. Two of three datasets successfully match merge. However, when I try to merge the third dataset in, instead of doing a match merge, it interleaves the datasets. I have checked the formatting and lengths of the variables in the three datasets using proc contents and they appear to be the same. I even tried to redefine the format, length, and label for each dataset to ensure they were identical and I am still not having luck. Any suggestions? Below is an example of my merge code. Dataset 1 and 2 (infect.weights and infect.temperature) contain four variables (ID, Group, Date,Weight/Temperature). Dataset 2 (infect.counts) contains nine variables (ID, Date, X1-X7). DATA infect.weights; SET infect.weights; ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID"; ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date"; RUN; DATA infect.temperature; SET infect.temperature; ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID"; ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date"; RUN; DATA infect.cell; SET infect.qscout; ATTRIB ID LENGTH=8 FORMAT=8. LABEL="ID"; ATTRIB DATE LENGTH=8 FORMAT=MMDDYY10. LABEL="Date"; RUN; DATA parameters; MERGE infect.weights infect.temperature; BY ID Date; RUN; DATA parameters2; MERGE parameters infect.counts; BY ID Date; RUN; /* I have also tried to perform the merge within one data step and get the same results */ My Results end up looking like this: ID Group Date Weight Temp X1 X2 X3 X4 X5 X6 X7 1 1 01/01/2001 100 100 . . . . . . . 1 1 01/01/2001 . . 1 2 3 4 5 6 7 2 1 01/01/2001 125 101 . . . . . . . 2 1 01/01/2001 . . 8 9 10 11 12 13 14 Thanks in advance!

HHutch · ‎10-31-2017

I have considered a mixed model incorporating a random herd effect. Since all variables of interest are herd-level, would I need to make a population-averaged/marginal model (PA) to accurately reflect the effects of management factors that vary between herds? I am currently attempting to do this with GENMOD until I figure out whether a count model is valid or not. If I am making a PA model, would my repeated statement be "repeated subject=herd" or "repeated subject=cow(herd)". There is only one observation per cow and herd level is the interest. However, there are a few instances where a "Cow ID" is the same number for two different cows on two different herds. Again, thank you for all of your help!

HHutch · ‎10-31-2017

Does this assume that each set of cows has the same infection probability or each cow within a set has the same probability. I believe I could say the probability is the same within but is not the same between.

HHutch · ‎10-30-2017

@Rick_SAS I appreciate your feedback and time. The intentions for my analysis is to use different herd-level factors as potential predictors for cows to become sick (such as herd prevalence). I am fairly new to statistical analysis so I could be completely wrong and poisson rate model may be inappropriate. I have considered a logistic regression model accounting for random herd effects. However, I do not have cow-level predictors. That is why I thought a count model might be more appropriate. Yes, I did use the article you linked. Thanks again!

HHutch · ‎10-30-2017

So I am trying to explain disease incidence across 108 cow herds. Each row represents a different herd. HerdConv= the total number of cows that became infected and HerdRetest= the total number of cows that were retested. This was during a 2 yr follow up period and cows were only retested once. Therefore, I can't really do rate analysis because I do not know the time at which they became infected. I just know that they were infected within the two year period. So I was trying to utilize a count model while keeping in mind that the number retested/herd was not the same.

HHutch · ‎10-30-2017

Thank you for your response @Rick_SAS. I did indeed use your article. I do see that the code is not fully showing, and I am uncertain why. I have included the offset in my calculations, I believe. The reason I overlayed the raw counts with the PMF was I visually trying to show what the outcome actually was versus what it should be when the offset is included. So should I only be using the CDF? Or maybe I misunderstanding. If so, how do I graph the predicted rate versus the observed rate?

HHutch · ‎10-30-2017

I am experiencing a similar problem discussed in the following post: https://communities.sas.com/t5/SAS-GRAPH-and-ODS-Graphics/proc-genmod-graphics-for-count-data-model-fit-assessment/m-p/353052/highlight/true#M12249 I am attempting to use graphics to illustrate the fit of both a Poisson model and negative binomial model to my data prior to model building. The data I have is count data with an offset examining incidence risk. The outcome is the number of new infections and the offset is the natural log of the number of subjects susceptible. According to the fit statistics ( Deviance/Pearson Chi-Square) a negative binomial distribution has a better fit. However, when I try to fit expected distribution curves to my data (PDF) or to create CDF curves, negative binomial clearly does not fit my data. I believe my problem is in the parameter estimation for my curves and not actually the lack-of-fit. When I use my code without an offset, both graphs are produced "correctly". When I include the offset, then the negative binomial graph does not fit. Can anyone provide me with assistance or suggestions in parameter estimation? Below is the code (including the offset) I have utilized from various resources online and within the SAS community. I did not include the code without the offset, for the sake of space, but can provide if desired. The code is identical without the offset included within my Proc Genmod commands. I have also included my data. /* Tabulate counts and plot data */ proc freq data=herdsummary; tables HerdConv / out=FreqOut plots=FreqPlot(scale=percent); run; proc univariate data=herdsummary; var herdconv; cdfplot herdconv / vscale=proportion odstitle="Empirical CDF" odstitle2="PROC UNIVARIATE"; ods output cdfplot=outCDF; /* data set contains ECDF values */ run; **********************CDF- OFFSET************************; proc genmod data=herdsummary; TITLE "Poisson With Offset"; model herdconv= /dist=Poisson offset=logherdretest dscale; /* No variables are specified, only mean is estimated. */ output out=PoissonFitOS p=lambda; run; /* Save Poisson parameter lambda in macro variables. */ data _null_; set PoissonFitOS(obs=1); call symputx('lambda', lambda); run; /* Use Min/Max values and the fitted Lambda to create theoretical Poisson Data. */ data TheoreticalPoissonOS; do HerdConv= 0 to 15; po=cdf('Poisson', HerdConv, &lambda); output; end; run; proc genmod data=herdsummary; TITLE "Negative Binomial with Offset"; model HerdConv= /dist=NegBin offset=logherdretest; /* No variables are specified, only mean is estimated */ output out= herdsummaryNB p=NBPred; ods output parameterestimates=NegBinParametersOS; run; /* Transpose Data. */ proc transpose data=NegBinParametersOS out=NegBinParametersOS; var estimate; id parameter; run; /* Calculate k and p from intercept and dispersion parameters. */ data NegBinParametersOS; set NegBinParametersOS; k = 1/(dispersion); p = 1/(1+exp(intercept)*dispersion); run; /* Save k and p in macro variables. */ data _null_; set NegBinParametersOS; call symputx('k', k); call symputx('p', p); run; /* Calculate theoretical Negative Binomial PMF based on fitted k and p. */ data TheoreticalNegBinOS; do HerdConv=0 to 15; NegBin=cdf('NegBinomial', HerdConv, &p, &k); output; end; run; data testOS; Merge theoreticalpoissonOS TheoreticalNegBinOS outcdf; Run; proc sgplot data=testOS; TITLE " Cumulative with Offset"; step x=ECDFX y=ECDFY / lineattrs=(pattern=1 thickness=2)legendlabel="ECDF"; step x=HerdConv y=PO /lineattrs=(pattern=15 thickness=2) legendlabel="Poisson Model Fit"; Step x=Herdconv y=NegBin/lineattrs=(pattern=2 thickness=2) legendlabel="NegBin Model Fit"; xaxis grid label="x" offsetmin=0.05 offsetmax=0.05; yaxis grid min=0 label="Cumulative Proportion"; run; ***********; /* Fit Poisson distribution to data. */ proc genmod data=herdsummary; TITLE "Poisson with Offset"; model herdconv= /dist=Poisson offset=logherdretest; /* No variables are specified, only mean is estimated. */ output out=PoissonFitO p=lambda; run; /* Save Poisson parameter lambda in macro variables. */ data _null_; set PoissonFitO(obs=1); call symputx('lambda', lambda); run; /* Use Min/Max values and the fitted Lambda to create theoretical Poisson Data. */ data TheoreticalPoissonO; do HerdConv= 0 to 15; po=pdf('Poisson', HerdConv, &lambda); output; end; run; /* Fit Negative Binomial distribution to data. */ proc genmod data=herdsummary; TITLE "Negative Bin with Offset"; model herdconv= /dist=NegBin offset=logherdretest; /* No variables are specified, only mean is estimated */ ods output parameterestimates=NegBinParametersO; run; /* Transpose Data. */ proc transpose data=NegBinParametersO out=NegBinParametersO; var estimate; id parameter; run; /* Calculate k and p from intercept and dispersion parameters. */ data NegBinParametersO; set NegBinParametersO; k = 1/dispersion; p = 1/(1+exp(intercept)*dispersion); run; /* Save k and p in macro variables. */ data _null_; set NegBinParametersO; call symputx('k', k); call symputx('p', p); run; /* Calculate theoretical Negative Binomial PMF based on fitted k and p. */ data TheoreticalNegBinO; do herdconv=0 to 15; NegBin=pdf('NegBinomial', herdconv, &p, &k); output; end; run; /* Merge The datasets for plotting. */ data PlotDataO(keep=herdconv freq po negbin); merge TheoreticalPoissonO TheoreticalNegBinO FreqOut; by herdconv; freq = PERCENT/100; run; /* Overlay fitted densities with original data. */ title 'Count data overlaid with fitted distributions-WITH OFFSET'; proc sgplot data=PlotDataO noautolegend; vbarparm category=herdconv response=freq / legendlabel='Count Data'; series x=herdconv y=po / markers markerattrs=(symbol=circlefilled color=red) lineattrs=(color=red)legendlabel='Fitted Poisson PMF'; series x=herdconv y=NegBin / markers markerattrs=(symbol=squarefilled color=green) lineattrs=(color=green)legendlabel='Fitted Negative Binomial PMF'; xaxis display=(nolabel); yaxis display=(nolabel); keylegend / location=inside position=NE across=1; run; title; I have attached graphs generated with and without the offset. As you see, the code "works/gives an expected distribution" when an offset is not included. However, when an offset IS included, there is a spike at zero not explained by the data.

HHutch · ‎10-26-2017

Hello, I know I am a little late to this post, but I hope it is still recent enough that you may provide assistance. I am also trying to utilize Genmod graphics for model fit assessment to show negative binomial distribution is a better fit than poisson. I found the same Pedan paper and wanted to create a similar graph. I have been successful doing this without an offset. However, my data requires that an offset be included. In the Pedan paper, they state an offset was used. I was wondering if you had any luck including an offset in your analysis? Fit statistics suggest the negative binomial model fit the data the best, however, when visualized this is not the case. I believe I am estimating the parameters wrong, given that an offset should somehow be incorporated. Can you provide me any help, suggestions, or references? Thanks in advance!

Online Status	Offline
Date Last Visited	‎04-24-2018 07:55 PM

Re: Problem with SAS Match merging interleaving the data sets

Re: Problem with SAS Match merging interleaving the data sets

Problem with SAS Match merging interleaving the data sets

Re: Count Model Fit: Parameter Estimation for Visual Assessment

Re: Count Model Fit: Parameter Estimation for Visual Assessment

Re: Count Model Fit: Parameter Estimation for Visual Assessment

Re: Count Model Fit: Parameter Estimation for Visual Assessment

Re: Count Model Fit: Parameter Estimation for Visual Assessment

Count Model Fit: Parameter Estimation for Visual Assessment

Re: proc genmod graphics for count data model fit assessment

Re: Problem with SAS Match merging interleaving the data sets

Re: Problem with SAS Match merging interleaving the data sets

Problem with SAS Match merging interleaving the data sets

Re: Count Model Fit: Parameter Estimation for Visual Assessment

Re: Count Model Fit: Parameter Estimation for Visual Assessment

Re: Count Model Fit: Parameter Estimation for Visual Assessment

Re: Count Model Fit: Parameter Estimation for Visual Assessment

Re: Count Model Fit: Parameter Estimation for Visual Assessment

Count Model Fit: Parameter Estimation for Visual Assessment

Re: proc genmod graphics for count data model fit assessment