BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LisavH
Calcite | Level 5

Hi all,

 

Because of missing data, I performed multiple imputation n times using PROC MI, resulting in n imputed datasets. After, I want to perform propensity score matching (PSM) followed by a survival analysis, to compare the survival of two treatment groups matched on several variables. But when I perform PSM, this is done on all n imputed datasets, resulting in n slightly different totals of matched patients. Because of this, I also get n survival curves and I just want one, pooled curve... I know about PROC MIANALYZE, however, I do not think this solves my problem. Does anyone know a way of pooling your imputed data right after PROC MI, or after PSM (although I think this is not possible)? 

 

Many thanks in advance! 

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

Hi @LisavH 

 

The estimates for hazard ratio shouldn't be greatly influenced by sample size, and differences in sample size will be accounted for in the magnitude of the standard errors, so unless the size of the imputed datasets differ by an order of magnitude, I would not be greatly concerned.

 

The question of one pooled survival curve is a bit different than one pooled hazard ratio estimate.  You could generate a plot using the pooled hazard rate, the HR upper bound and the HR lower bound for a fixed number in the initial risk set using a DATA step and SGPLOT.  Would something like that meet your needs?

 

SteveDenham

View solution in original post

7 REPLIES 7
LisavH
Calcite | Level 5

Hi Jos, 

 

Thank you for your reply! I have seen this paper before (should have mentioned it, sorry...) and it indeed describes how to pool results from analyses on your imputed data. However, I hope to find a step in between that creates one pooled dataset, so I can perform analyses on this one dataset without having to pool after (with proc mianalyze). I will keep on searching, should I find something useful then I will post it here. 

 

All the best, 

Lisa

SteveDenham
Jade | Level 19

I am not sure I am understanding the result you are looking for.  PROC MI will give you as many imputations of your data as you wish, and then MIANALYZE pulls together the survival analyses generated by analyzing each imputed dataset. So long as you can give MIANALYZE a parameter estimate and a standard error for each imputation, you can get a pooled response.  The hard part here is what survival analysis procedure you are using.  LIFETEST won't give parameter estimates, while LIFEREG and PHREG will.  Note that there is an assumption of normality for MIANALYZE in this method, so it may be necessary to transform relative risk or hazard ratio estimates and standard errors before using MIANALYZE.

 

SteveDenham

LisavH
Calcite | Level 5

Thanks Steve for your reply!

 

I guess the real question is how to perform pooled survival analysis on the imputed datasets when they differ in the total number of patients due to a matching procedure. Without matching I would indeed use proc mianalyze as you described. But due to matching, the total number of patients differs per imputed dataset, so I think I cannot generate one pooled survival curve. Unless there is a way to create a new dataset with the pooled imputated values, resulting in the original number of patients. Then I could perform the matching on that dataset, and run a survival analysis as it was without imputed values (so a proc phreg without having to do proc mianalyze afterwards). I hope I clarified my question a bit. Many thanks for your help.

SteveDenham
Jade | Level 19

Hi @LisavH 

 

The estimates for hazard ratio shouldn't be greatly influenced by sample size, and differences in sample size will be accounted for in the magnitude of the standard errors, so unless the size of the imputed datasets differ by an order of magnitude, I would not be greatly concerned.

 

The question of one pooled survival curve is a bit different than one pooled hazard ratio estimate.  You could generate a plot using the pooled hazard rate, the HR upper bound and the HR lower bound for a fixed number in the initial risk set using a DATA step and SGPLOT.  Would something like that meet your needs?

 

SteveDenham

LisavH
Calcite | Level 5

Hi Steve, thanks again for your reply! 

 

The number of patients does indeed not differ much in each matched dataset. This would mean that small differences will not pose any problem, since the results will be quite similar. I would have to look into the pooling step in more detail to generate a plot as you described, but I think this might be a good solution!

My only concern is whether this can be justified in a scientific journal for example. But that's whole different question to answer in a different topic/forum,, so I will accept your post as a solution 🙂 

 

Thanks!

 

SteveDenham
Jade | Level 19

@LisavH 

Unless you are publishing in Technometrics or something similar, the methodology should not be suspect - it is basic Rubin or Allison.  Spell out what you did in short, non-jargon sentences and even editors or peer reviewers should be able to see what you did.  Now whether there is a more "complete" (read: more difficult, but used by that reviewer in their dissertation) is something else to deal with.  It's kind of why I like being in the pre-clinical space, where it's more critical to get a good result to a sponsor for a decision than it is to get something published.

 

SteveDenham

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1444 views
  • 0 likes
  • 3 in conversation