Hi,
I'm using proc genmod to compare differences in cost between surgical operations that took place hospitals before and after they were enrolled in an intervention program designed to reduce costs. I have 5 years of data (each year, more hospitals joined the program, but once a hospital joined, none left). I hypothesize that hospitals that joined in the first year had the most time to receive the beneficial cost-reducing effects of the program, whereas hospitals that joined in the last year did not have much time and maybe wouldn't experience much (or any) cost reduction. So I would like to compare pre/post costs separately for each of the 5 years. I've also clustered on hospital because I figured those data are correlated.
This is the code I used:
proc genmod data=allthree order=formatted; class hospital/ desc param=ref;
model cost=prepost cohort covariate1 covariate2/link=log dist=gamma;
repeated subject=hospital/type=ind;
estimate 'Pre, 2006 Cohort' intercept 1 ;
estimate 'Post, 2006 Cohort' intercept 1 prepost 1 ;
estimate 'Pre, 2007 Cohort' intercept 1 cohort 0 0 0 1 ;
estimate 'Post, 2007 Cohort' intercept 1 prepost 1 cohort 0 0 0 1 ;
estimate 'Pre, 2008 Cohort' intercept 1 cohort 0 0 1 0 ;
estimate 'Post, 2008 Cohort' intercept 1 prepost 1 cohort 0 0 1 0 ;
estimate 'Pre, 2009 Cohort' intercept 1 cohort 0 1 0 0 ;
estimate 'Post, 2009 Cohort' intercept 1 prepost 1 cohort 0 1 0 0 ;
estimate 'Pre, 2010 Cohort' intercept 1 cohort 1 0 0 0 ;
estimate 'Post, 2010 Cohort' intercept 1 prepost 1 cohort 1 0 0 0 ;
run;
prepost=1 if the procedure took place after the hospital joined the program and 0 if it took place before
cohort=0-4. 0 if they joined the first year, 1 if they joined the second year, etc.
I used the estimate statements to get (I think) the mean pre/post costs within each cohort.
Is there a way to determine whether the pre/post differences in cost were significantly different from each other, within each value of "cohort"?
Thanks so much.
First thing I noticed is that there is no interaction term in the model for cohort by prepost. As a result, your estimates (and lsmeans, which I'll get to later) will reflect exactly the same difference between pre and post for every cohort. I don't think this what you want. Give the following a try:
proc genmod data=allthree order=formatted;
class hospital prepost cohort/ desc param=ref;
model cost=prepost cohort prepost*cohortcovariate1 covariate2/link=log dist=gamma;
repeated subject=hospital/type=ind;
lsmeans prepost cohort prepost*cohort/ilink diff;
slice prepost*cohort/nof ilink sliceby=cohort diff; /* This will give a test of pre vs post for each level of cohort */
run;
Steve Denham
Thanks so much, Steve, that worked perfectly except I had to change the param=ref to param=glm (otherwise i got an error). Also, the slice command was red in my editor, but I ran it anyway and it worked without any error messages in the log, so I assume all is well. Thank you so much, I really appreciate it!
After quickly reading this topic, I think my problem/question is very similar.
I have data of a cohort study with 4 rounds. In these 4 rounds, we asked respondents about smoking. We plotted the prevalence of smoking in different age groups, defined by their age at baseline. The link below shows an example of the analysis I want to do. The time between rounds is 5 years. So, if you have a group aged 40-49 (mean 45) at baseline (round 1), in round 3 (10 years later) this group will be aged 50-59 (mean 55). The prevalence of smoking in this group can be compared to the prevalence of the group aged 50-59 (mean 55) at baseline. Basically what I then want to say is: a younger generation (40-49) smokes less at mean age 55 then a older generation (50-59) at mean age 55. The link below shows an example (it's a link to a figure). It shows in this example that the 40-49 generation smokes 9% less at age 55.
http://img35.imageshack.us/img35/2451/exampleej.jpg
In order to model these lines, a statistician advised us to use proc glimmix. The model I used is:
proc glimmix data=dataname initglm method=quad;
model smoking(event="1") = dum60 dum70 dum80 age age*dum60 age*dum70 age*dum80 / dist=binary link=logit cl covb s;
random intercept age / subject=id;
by sex;
run; quit;
dum60 = generation aged 60-69 at baseline
dum70 = generation aged 70-79 at baseline
dum80 = generation aged 80-89 at baseline
dum50 is the reference
Now I want to use estimate statements in order to test what I described above (in the figure). What is the difference between two generation at a predefined age? And is this difference significant?
I have tried to estimate the difference between the generation 60-69 at age 70 with the generation 70-79 at age 70 with these statements (based on the advice of the statistician):
estimate dum60 1 dum70 -1 age*dum60 70 age*dum70 -70 / cl; (difference between the two lines)
estimate dum60 1 intercept 1 age 70 age*dum60 70 / ilink cl; (prevalence of 60-69 group at age 70)
estimate dum70 1 intercept 1 age 70 age*dum70 70 / ilink cl; (prevalence of 60-69 group at age 70)
The first statement should tell me if the difference is significant. The second and third statements should tell me the prevalences of the two groups at age 70. We used ilink to show us these prevalences (we want to present the differences + significance in a bar chart). The weird thing is, the prevalences as shown by ilink do not correspond with the prevalences we find in the raw data. They do not look similar at all! ilink gives me prevalences of 1E-7 etc, while 'real' prevalences based on the raw data are around 25% and 20% for those generations.
Is this the right method to test what I want to test, which is: the magnitude of the differences between two generations and whether this difference is significant.
If not, how can it be done else within the glimmix procedure?
Here is a link to my original topic, with a few other problems too:
Hope you guys can help me. Thnx in advance.
While I applaud the method to get the comparisons of interest, I can see where the problem might arise. The assumption of a linear response with age may be what is causing the problem. I see that you have the s option set in the model statement. What are the coefficients for the various terms? With those in hand, we might be able to work out what is going on.
I also wonder if the problem may be coming from the inclusion of age as a random effect--and that it is dominating the fixed effect. You may have to change the estimate statement from the BLUE form it is in to a BLUP form that includes the random effect.
Steve Denham
I have a very similar question, though I think mine is much simpler. I am analyzing some pre-post test data wherein the IV and DV are both binary. I am interested in testing the hypothesis that the post-test scores have changed significantly since an intervention that occurred between the pre test and the post test. For the univariate case, I have used the McNemar test, and this is straight-forward. However, I'm unsure how best to control for a covariate (or two).
I have seen some examples where a GEE approach is used with several follow-up assessment (proc genmod is the appropriate command). However, I only have one follow-up time. I just want to get adjusted odds ratios for my pre-post test.
What is the best way to go about this?
Thanks in advance.
John Peipert
Does the code that I gave above fit your situation? Of course, changes in distribution, etc. would have to be made to get to the oddsratio. Maybe something like:
proc genmod ;
class dv iv prepost bubjid/ desc param=ref;
model dv= iv prepost iv*prepost/dist=binary;
repeated subject=subjid/type=ind;
lsmeans iv prepost iv*prepost/ilink oddsratio
slice iv*prepost/sliceby=iv oddsratiof; /* This will give a test of pre vs post for each level of iv */
run;
Steve Denham
Thanks, Steve. Is the variable "prepost" in your analysis the difference between the pre and post scores? My variables are dichotomous, so the pre value is either 1 or 0 and the post value is either 1 or 0.
Therefore, I have tried to write code that I hope accomplishes something similar to what you've suggested. Conceiving of this as a test of the difference in the change from pre to post (one sample) within groups of another variable (level 1 and level 2), I have used a gee equation that essentially models this problem as if the variable measured at pre and post was a longitudinal outcome, and looked at differences in this outcome over time within groups of my covariate (two levels, 1 and 0). I indicated the pre and post values of this variable using a time variable (1=pre, 2=post). (See the code below.)
If the time variable is significant, then I know that there was a difference from pre to post (I think I'm calculating the odds ratios comparing post to pre), and if the interaction term is significant I know that the change from pre to post.
Does this accomplish something similar the code you wrote, or to what I've described above for that matter?
Thanks for your help on this.
Devin Peipert
Here is the code:
PROC GENMOD descending;
CLASS subjid covariate(ref="0") time(ref="1")/PARAM=effect;
MODEL outcome = covariate time covariate*time/ type3dist=binomial link=logit;
REPEATED subject=subjid / type=un;
estimate'Beta' covariate 1 -1/ exp;
estimate'Beta' time 1 -1/ exp;
estimate'Beta' covariate*time 1 -1/ exp;
RUN;
The code looks good. My 'prepost' variable is exactly equivalent to your 'time' - an indicator (class) variable representing the period at which things were measured. One thing I would change are the labels for the estimate statements, so that if you output them to a dataset, you could find which estimate was associated with which comparison.
Steve Denham
Thanks, Steve! All your feedback has been really helpful to me figuring this out for a first go.
I actually have one more question about the use of proc genmod in a slightly different scenario.
I need to model the effect of some variables measured at a baseline assessment on a serial outcome measured in subsequent years (but, notably, not at the baseline assessment). I'm wondering how appropriate the code I have below would be.
First, the data set-up is as follows, with id representing unique subjects, year representing the time,outcome representing a dichotomous outcome measure, bl_var1 representing a dichotomous predictor variable, and bl_var2 representing a second dichotomous predictor variable:
id year outcome bl_var1 bl_var2
1 2009 1 0
1 2010 0 1 0
1 2011 1 1 0
2 2009 0 1
2 2010 1 0 1
2 2011 0 0 1
The values for bl_var1 and bl_var2 are equal because the year 2009 - or baseline - measurement is the only measurement for these that exists. It has been assigned to the subsequent years as well.
Again, I want to model the impact of the baseline characteristics on the outcome measured in 2010 and 2011. I don't want to run separate models for 2010 and 2011 because I assume that the values for the outcome for these years are correlated on subjects.
So, I'm using the code below. have specified the autoregressive working correlation matrix because it is a serial measurement of the same outcome.
PROC GENMOD descending;
WHERE YEAR NE 2009;
CLASS id bl_var1(ref="0") bl_var2 (ref="0") year (ref="2010")/PARAM= effect;
MODEL outcome = bl_var1 bl_var2 year
/ type3 dist=binomial link=logit;
REPEATED subject=provnum / type=ar(1) corrw;
estimate 'bl_var1' bl_var1 1 -1/ exp;
estimate 'bl_var2' bl_var2 1 -1/ exp;
estimate 'year' year 1 -1/ exp;
RUN;
Do I need an interaction between my baseline measures and year to see within subjects effects? E.g. would I want to add the following?
bl_var1*year bl_var2*year
If so, would these be interpreted as a change in the baseline value being associated with a change in the outcome from 2010 to 2011?
Thanks again for your help.
Devin
)I hope you have a lot of data. Binomial covariates and response variables inevitably lead to a need for large sample sizes.
On to the problem at hand--I might proceed as per Littell et al. (SAS for Mixed Models, 2nd. ed.) do in their chapter on analysis of covariance, if I had more years of data, and treated year as a continuous covariate. First, fit a model with separate slopes:
model outcome=b1_var1 b1_var2 year year*b1_var1 year*b1_var2....
and look at the test for the interactions. None of the other tests are really meaningful at this point. If these are non-significant, then you probably do not need the interactions to fit your results. If they are significant, then you will probably need to compare at various values of the continuous covariate (year).
However, if you only have two years of data, it makes no difference whether you consider year as a continuous or class variable--you don't have a range of values to choose from (this is the model you present). The year by baseline interactions are relatively straightforward in this limited context. I think your suggested inclusion of the interactions in the model will yield interpretable results of the sort you are after
Steve Denham
Message was edited by: Steve Denham
Message was edited by: Steve Denham (just kept thinking about this
Hi Steve,
As usual, thanks for your help.
As you say, i may run into power issues given the data. Those aside - given that I only have two years of data for my outcome measure, and that I'm strictly looking at the impact of baseline characteristics (taken the year preceding the first year of data for the outcome) on a serial outcome for two years, my concern was about whether or not the interactions would be meaningful.
As I wrote, I was thinking they may give me the effect of the baseline characteristics (yes or no) on whether the outcome changes between the two years of data I have for it. If I'm reading you correctly, it sounds like this is a fair interpretation of those terms.
Again, thanks for your time and consideration.
Devin
proc genmod data=allthree order=formatted;
class hospital prepost cohort/ desc param=ref;
model cost=prepost cohort prepost*cohort covariate1 covariate2/link=log dist=gamma;
repeated subject=hospital/type=ind;
lsmeans prepost cohort prepost*cohort/ilink diff;
slice prepost*cohort/nof ilink sliceby=cohort diff; /* This will give a test of pre vs post for each level of cohort */ run;
Using the code above,
How do you determine whether the 1) pre mean for all 4 cohorts values are significantly different (Test whether pre lsMean in cohort1= pre lsMean in cohort 2= pre lsMean in cohort 3 = pre lsMean cohort 4), post lsmean values for all cohorts are significantly different (Test whether post lsMean in cohort1= post lsMean in cohort 2= post lsMean in cohort 3 = post lsMean cohort 4and the overall differences between co pre vs post for each level of cohort (Test whether Mean diff in cohort1=mean difference in cohort 2=mean difference in cohort 3 =mean difference cohort 4 )
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.