01-30-2015 09:30 PM
I have hospitalization data (nsum) over a 17 year period for 3 different study areas (variables period and area). I have the log of the population as the offset. So, for my code I have entered:
proc genmod data=want;
class period area;
model nsum=area period area*period / dist=NB link=log offset=log pop type1;
I am not really interested in estimates just for area or just for period, rather, I am interested in the difference between the areas over the time period. In the output for the interaction of area*period, there are 3 rows for each year in the 17 year period so I have area 1, area 2, and area 3 (reference group). My question is - how do I interpret this output? Is there a way to broadly look at if the 3 areas differ significantly over this time period or do I have to look at the p-values for each year for areas 1 and 2 compared to the reference group?
I also have the LR statistics for type 1 analysis, but I'm not really sure what this is telling me when I look at the p-value for the area*period term.
Thanks for your help!
02-02-2015 12:47 PM
First, change from type1 to type3 for the F tests. These are independent of the order of entry into the model.
Second, review what a significant interaction means--that the differences between areas is not the same at all time points, or conversely, the difference between time points is not the same for all areas. A graphical representation is usually very helpful in making this clear. If you plot all of the areas versus time, a significant interaction would appear as non-parallel trajectories in time.
02-02-2015 09:24 PM
Thanks for your help. I do understand what the significant interaction means - I had previously plotted everything prior to running through in SAS and was planning on using the stats to confirm what looked like it would be significant from my plots. So I guess my question then is if I should be looking at the type3 p-value for the interaction or if I should be looking through at all of the estimates and p-values for each year/area in comparison to the reference area? I would think it's the type3 p-value for the interaction that I should be looking at for the overall picture; however, someone had told me not to focus on the p-values. So I'm not entirely sure what I should be looking at/reporting. Thanks again!
02-04-2015 08:42 AM
I would use the type3 tests as the "searchlight" to illuminate specific comparisons of interest. Those could be obtained through the use of the LSMESTIMATE statement.
02-04-2015 09:35 PM
Thanks, Steve. I've been playing around with the LSMESTIMATE statement and have been reading up online, but am having difficulty translating this to my example of looking at an interaction between one variable (area) with 3 levels and another variable (period) with 17 levels.
I tried to use LSMEANS to print out the matrix so I can see where the 0's and 1's fall (as it seems to me, but I could be completely wrong, that I need to input what I am trying to examine in the LSMESTIMATE, but I am very confused on how to do so with the numbers following the variables I specify), but then when I did a test run of code, it said that 'The level 0 is not valid for CLASS variable period. The level specifications for this variable must range from 1 to 17'. The code I tried is as follows:
proc genmod data=want;
class area period;
model nsum=area period area*period / dist=NB link=log offset=logpop type3;
**I had tried something like [1 0 0] [0 1 0] [0 0 1] and that gave the error - do not understand how to put in the numbers to examine the area*period interactions**
Hoping someone can help with a start on what I need to follow that 'lsmestimate area*period' statement. Thanks!
02-05-2015 03:19 PM
I would start using the positional syntax, but that's because I first learned writing these out that way.
So, for simplicity, suppose there were 3 areas and 4 periods. You have found a significant interaction, so it is appropriate to make comparisons between areas at the various time points:
lsmestimate area*period 'Area 1, comparing period 1 to period 2' 1 -1 0 0 0 0 0 0 0 0 0 0,
'Area 1, comparing period 1 to period 3' 1 0 -1 0 0 0 0 0 0 0 0 0,
'Period 1, comparing area 1 to area 2' 1 0 0 0 -1 0 0 0 0 0 0 0;
This gives 3 simultaneous comparisons. The first term is area1, with 4 periods, then a space (ONLY FOR READABILITY), then area2, with 4 periods, another space for readability, and then area3 with 4 periods. In this way, any combination of area by period lsmeans can be compared. Check out the Shared Concepts>ESTIMATE statement part of the documentation. It points out that LSMESTIMATE is constructed in the same way as the ESTIMATE--it just uses the lsmeans rather than the solution vector.