Re: Mediation Analysis for ordinal outcomes.

Giampaolo · Posted 05-22-2020 10:32 AM

Dear SAS users,

I have a three level (low moderate high) ordinal variable on which I would like to perform a mediation analysis, but Proc CAUSALMED cannot be used for ordinal outcomes. Is there another procedure or (hopefully simple approach) that could be used for this purpose? Thank you!

SteveDenham · Posted 05-26-2020 08:57 AM

Would the results make sense if you collapsed the response variable, say into low vs med and high together in one analysis, and med vs high in another? I haven't done enough causal analysis to know if that approach is reasonable or not, but it has the advantage of being simple to do.

SteveDenham

Giampaolo · Posted 05-26-2020 09:20 AM

Hi Steve,

Thank you for your reply. Yes it would definitely make sense. The problem though, is that when the dependent variable is dichotomized, the independent variable I would like to test as a possible mediator is no longer significant. Maybe because there is some loss of information in using the outcome as dichotomous? The outcome variable (the Nottingham Prognostic Index (a cancer prognostic score) is continuous but has a very difficult distribution to model. See Histogram.

Best

Giampaolo

SteveDenham · Posted 05-26-2020 09:28 AM

Is the lack of significance found in both dichotomized datasets? If so, then I suspect that you will have to accept the notion that you don't have enough data to drive the effect to a "significant" level? Could you calculate some sort of effect size for the mediator in the two datasets, and compare that to a CID (clinically important difference)?

SteveDenham

Giampaolo · Posted 05-26-2020 09:47 AM

Sorry if I was confusing. There is only one dataset. In one case I dichotomized the NPI variable. In the other case I converted NPI into an ordinal variable using two thresholds from the literature. I think you are right that the sample size is not sufficiently powered, but I was intrigued by the fact that my predictor was significant with the ordinal and not with the binary outcome.

SteveDenham · Posted 05-26-2020 10:20 AM

You know, that makes sense given the distribution of the predictor variable. What happens if you just leave it as a continuous (well at least relatively continuous) variable? I think I misinterpreted this situation and confused the response variable with the mediating variable, so far as dichotomous/ordinal goes.

So if you want to dichotomize NPI, the cutpoint is going to be critical. What value preserves most of the information provided by the continuous version of the variable? Also, when I look at the pdf for NPI, it looks long tailed to the right. What sort of distribution do you see if you took the natural log of the NPI? Perhaps ln(NPI) has a more distinguishing cutpoint, or perhaps ln(NPI) is more significant as a mediator? Lots of ways to go on this one.

SteveDenham

Giampaolo · Posted 05-26-2020 01:16 PM

I have tried proc genmod using several different distributions with log link but the association with continuous NPI was not significant. When I tried the non parametric Jonckheere-Terpstra Test however the association of continuous NPI with the predictor was significant. Maybe the problem is that the distributions available do not reflect the data?

SteveDenham · Posted 05-27-2020 08:25 AM

I assume that for the JT test the row variable was based on the binning you presented in the graphic, and the column variables were the observed in the bin and the not observed (total - observed). If that is correct (or even close), then you have done a better job of giving an approximation to the distribution, and recall that the distribution options and the canonical links apply to the X'beta matrix, so it includes all of the predictors, so it is much more dependent on the distribution of Y than of the distribution of any single predictor.

SteveDenham

Giampaolo · Posted 05-29-2020 03:29 PM

Hi Steve,

Thank you for stating the points I needed to remember in using the procedure and apologies for coming back to this post again after a few days, but there is one thing that has been in my thoughts and I was hoping to clarify. I understand the link options apply to all the predictors. With respect to the distribution options, though, I am not sure I have misunderstood the procedure or misinterpreted your message. I thought that with the distribution option one attempts to model the distribution of the response variable Y. Am I wrong? Partially wrong? Could you please explain if this interpretation conflicts with your sentence "the distribution options and the canonical links apply to the X'beta matrix, so it includes all of the predictors...".

Thank you very much!

Giampaolo

Thank you

SteveDenham · Posted 06-01-2020 08:09 AM

Two sides of the same coin. The X'beta matrix gives the predicted Yhat values. In a generalized linear model that is what is used to predict the dependent variable (Y) values. You are correct in that we generally pick distributions and links based on our knowledge of the dependent variable, but the actual algorithmic processes are done using the X'beta and Y values in combination.

SteveDenham

Giampaolo · Posted 06-01-2020 10:54 AM

Thank You!

Ready to join fellow brilliant minds for the SAS Hackathon?