Hello, this seems like it should be an easy answer, but I've been stumped for a couple of days. Very sorry if this is much simpler than I'm thinking!
About the data: I have two sets of data with the amount of surgeries performed per year from 2002-2015. The two sets of data represent two different disease patterns, however they both received the same surgical procedure.
What I am trying to do: I'd like to figure out what kind of statistical procedure would determine if there is any significant difference between the two temporal trends. I also want to run another two tests to see if there is a significant change from year to year within the trends themselves.
The data:
data graph1;
input year ICM NICM;
datalines;
2002 4.13 7.10
2003 5.52 8.94
2004 3.73 7.27
2005 4.59 8.12
2006 4.88 7.02
2007 5.42 6.88
2008 8.66 8.29
2009 8.93 7.36
2010 8.88 7.64
2011 9.92 8.77
2012 9.60 6.20
2013 9.49 6.68
2014 9.09 6.23
2015 7.16 3.48
;
run;
My question: What tests should I use for this? I feel like a simple proc glm is insufficient. I also don't think I need ARIMA, because I'm not trying to forecast anything.
Thank you so much for reading!
Since you haven't told us anything about the response, I'll just assume it is approximately normal in distribution.The following uses the method described in this note to compare slopes. The first analysis fits a model allowing for separate intercepts, separate slopes, and an interaction that represents the offset from the common slope for ICM. The test of offset (interaction) parameter is a test of equal slopes. Since it is obviously significant, the next analysis fits a model allowing separate intercepts and separate slopes for the two diseases. The effect plot shows the very different slopes. Also, since the two slope parameters are both significant, that indicates that both disease responses change significantly over time.
If the response is not normal, then a different distribution could be selected in both PROC GENMOD steps.
data graph1;
input year ICM NICM;
yearidx=year-2001;
disease='ICM '; y=ICM; output;
disease='NICM'; y=NICM; output;
datalines;
2002 4.13 7.10
2003 5.52 8.94
2004 3.73 7.27
2005 4.59 8.12
2006 4.88 7.02
2007 5.42 6.88
2008 8.66 8.29
2009 8.93 7.36
2010 8.88 7.64
2011 9.92 8.77
2012 9.60 6.20
2013 9.49 6.68
2014 9.09 6.23
2015 7.16 3.48
;
proc genmod;
class disease;
model y=disease yearidx yearidx*disease / noint;
run;
proc genmod;
class disease;
model y=disease yearidx*disease / noint;
effectplot;
run;
...but there is some evidence of curvature, so that might require a more sophisticated model that allows for it. A similar approach could be taken to check for differing quadratic effects. Or perhaps splines could be used to allow for greater flexibility.
proc sgplot;
loess y=y x=year / group=disease;
run;
Hi StatDave,
Thank you so very much for your answer(s). I appreciate it so much. I was so stuck on the fact that there was time involved that I didn't realize this could be approached using GENMOD. Would it be appropriate to say that the response variable has a Poisson distribution? Each observation is a discrete count representing the percentage of ablation surgeries performed per year for each group (NICM and ICM).
I'm not sure of how to check for differing quadratic effects though, but I did see the curvature from running the sgplot you mentioned.
I am attaching a picture of the graph that I created the other day with the two temporal trends in case it is helpful for visualization.
Each observation is a discrete count representing the percentage of ablation surgeries performed per year for each group (NICM and ICM).
You percent response is an aggregated binary (ablative surgery or not) response. If you have the counts of the number of ablative surgeries and the total number of surgeries making up the percentages, you would specify them in the MODEL statement using events/trials syntax and specify that the distribution is binomial:
model Nablative/Ntotal=disease yearidx yearidx*disease / dist=binomial noint;
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.