04-06-2012 10:39 AM
Can someone help me understand why my PROC LIFETEST results for multiple comparison tests are all significant given the Kaplan-Meier curves (in attached .png file)? Seems like groups 1 and 2 should NOT be significant. However, the paired comparison test shows up as significant (in attached .pdf). Here is the code I'm running:
proc lifetest data=getdata2 plots=s(atrisk=0 to 10 by 2 nocensor test) notable;
strata group / diff=all adjust=bon;
When I run a separate procedure and restrict the data to only groups 1 and 2, the log-rank test is highly non-significant (log-rank p=0.82) and what I would expect given the curves.
Please help me understand. Thanks!
04-09-2012 02:55 AM
log-rank test has most power when these two survival curve are parallel .
but From your Photo, It seems that you should use some non-parameter method Such as Wilcoxon Test to instead of Log-Rank test.
04-09-2012 03:53 PM
Thanks Ksharp, I appreciate the response. The results of the Wilcoxon tests mirror that of the log-rank tests in that each paired comparison is significant. Yet the curves for groups 1 and 2 appear so similar and cross at various time points (early, mid, and late) that it just does not seem that these should be found to be statistically different. I'm still confused and am wondering if I should trust the DIFF=ALL option in the STRATA statement for PROC LIFETEST.
04-10-2012 01:20 PM
There's definitely wrong with something somewhere...
You have a lot of censoring in your data but I doubt that's the issue.
If you run the proc with only the 2 groups (ie group 1 and group 2) the raw p-value should equal the raw p-value in your adjustment for multiple comparison table, which it clearly doesn't according to what you're seeing.
Your data show's that two values are getting excluded due to missing observations, so you would need to make sure that the 2 are also being excluded for both analysis sets. I don't see the two observations being that influential regardless, but worth checking.
Can you post the plot that you get from running the two groups alone and your log from both of the procs?
Your best (and fastest) bet may be contact tech support though.
04-10-2012 02:41 PM
Thanks for taking the time to help. Attached are the survival curves for just groups 1 and 2. Also attached is the log file from running the 4 group analysis first, then the 2 group analysis. I also included the output. I will follow up on those 2 missing observations and see if I can get those corrected.
If you don't see anything else, I probably will contact tech support. It does provide some comfort to me that I'm not the only one who thinks something is off. I'm not 100% convinced that it isn't me, though.
04-10-2012 03:30 PM
Ok...I'd suggest contacting tech support.
This is the closest Note I could find to your issue which doesn't seem very relevant.
So you could try the sort and see if it's a similar issue depending on what version of SAS you're on.
Post the answer back to the forum please
Just a note you can use the timelist=0 to 10 by 2 to reduce your output some more so you don't have the long outputs of the survival tables as well.
04-12-2012 11:19 AM
Thanks Reeza for the timelist option suggestion. I will definitely use it in the future.
SAS support is really terrific about responding. They always seem to have a quick turn-around time and are extremely helpful. Below is the un-editted response I rec'd. It may well be as they suggest, but I still remain highly skeptical.
"Statistically significance has much to do with the sample size. If you have a large enough sample size, you can detect very small differences. Although the curves for Group1 and Group2 appears to be close to each other, the large sample sizes may still detect a difference. If you use the data for Group1 and Group2 alone, the insignificant difference could be due to a smaller sample size.
If you look at the chi-square values in the multiple comparison results, you see Group 3 is different from all other groups. And Group1 and Group2 pair and Group2 and Group4 pairs are less different. "
+ - - - - - -
The gist of the idea here is that your results are theoretically and practically possible to encounter and we don't suspect anything is amiss here. We have also been notified of situations where the overall test was significant but none of the pair-wise tests were significant - and this seemingly anomalous situation is also theoretically and practically possible as well.
Although not in the survival analysis realm, Milliken & Johnson(1992) in their "Analysis of Messy Data" book present a seemingly anomalous multiple comparisons example whereby means that are far apart were not significantly different but means that were closer together were significantly different. In their discussion, as I recall, the culprit was the varying/unbalanced sample sizes among the groups. I don't have access to Milliken & Johnson at present but I believe this example was in Chapter 3 at or near the end.
04-12-2012 12:22 PM
I did notice that your group sizes are very different but I thought the following would hold...
If you run the proc with only the 2 groups (ie group 1 and group 2) the raw p-value should equal the raw p-value in your adjustment for multiple comparison table.
Message was edited by: Reeza It has to do with the variance estimates which change when you run a model with only two groups versus with all four groups. Compare your variance estimates for Group 1 and 2 from the model with all 4 groups (-6.5257) to the model with only two groups (-27.1545).
07-26-2017 02:32 PM
Hi, I had the same problem. Does anyone have solutions for this problem? Should I trust p-value from 2 levels subset comparison or multiple comparison? thanks.