BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
asgee
Obsidian | Level 7

I'm trying to run an analysis where I have a continuous variable (serum) and binary outcome "par" (yes/no). 

 

My analyses requires that I impute my dataset (50 iterations). I'm still quite new to visualizing plots, and am having trouble trying to visualize the spline effects of my logistic model. Here's my current code:

 

title "Restricted Splines";
title2 "Four Internal Knots";
		ods select ParameterEstimates SplineKnots;
		proc logistic data=imputed_50; 
		effect spl = spline(serum / details naturalcubic basis=tpf(noint)                 
                knotmethod=percentilelist(5 35 65 95)); /* RESTRICTED CUBIC SPLINE BASED ON 4 KNOTS */ 
		class par (ref='0') dev (ref='0') eth (ref='1') product (ref='0') / param=ref; 
		model par (event='1')= spl age weight eth dev product / selection=none covb;  
		by _Imputation_;     /* RUNNING 50 ITERATIONS OF THIS MODEL BASED ON AN IMPUTED DATASET */ 
		ods output ParameterEstimates=Lgsparms; 
		run;

What I'm trying to replicate is this graph based on the SAS documentation of Visualizing regressions with splines:

Knots.PNG

The example above is obviously using a continuous dependent variable (MPG). I'm having trouble trying to show the spline effect / spline points of "serum" based on a binary outcome "par", particularly since my modelling is based on an imputation and not just one iteration. 

 

I understand there's the "effectplots" option in PROC LOGISTIC but not sure how to implement that in my modelling that is based on an imputed dataset. Any help plotting this would be very much appreciated. 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

It sounds like you have three problems:

1. You need to evaluate the final model, which is defined by the parameter estimates table.

The logistic model is 

eta = Intercept + b1*spline1 + b2*spline2 + ... + b8*product

prob = logistic(eta)

2. Because your model is defined in terms of splines, you should output the design matrix, which will contain the spline1-spline3 variables.

3. You need to decide how to visualize a surface in 8 variables. The typical way is to create a slice plot in which you fix a value for the age weight eth dev and product variables (often a mean or median value) and then plot the predicted probability versus the serum variable. The general idea of a slice plot is explained in this blog post: https://blogs.sas.com/content/iml/2017/12/18/visualize-multivariate-regression-models-by-slicing-con...

I think you might need to create the plot manually.

 

There are a lot of technical details involved in doing this so I recommend asking if you really need to do this visualization. It might be sufficient to plot the predicted probability for each subject in the data, which is much easier.

View solution in original post

6 REPLIES 6
Rick_SAS
SAS Super FREQ

Do you really want to view 50 plots, one for each imputation? Typically you will want to suppress all output (tables and graphs) by using the NOPRINT option during the estimation of parameters for each set of imputed values. You then run PROC MIANALYZE to aggregate the 50 estimates into one "best" set of parameter estimates. You can then visualize the predicted probabilities for the final model.

asgee
Obsidian | Level 7

Hi @Rick_SAS ! Yes I forgot to mention that. That's the next sections in my code (I run proc mianalyze and aggregate the 50 estimates). The "ParameterEstimates=Lgsparms" is that output that I sort of clean later down the line and produce one table that shows the "best" parameter estimates using the MODELEFFECTS statement:

   ods trace on;
		proc mianalyze parms=Lgsparms; 
		modeleffects Intercept spline1 spline2 spline3 age weight eth dev product; /* 3 spline effects based on 4 knots */
		run;
   ods trace off;
   ods listing close;
   ods output ParameterEstimates=mi_results

From there I use the "mi_results" output to produce a table summarizing that one "best" set of parameter estimates.

 

As you pointed out, my trouble really is just trying to visualize the predicted probabilities for that aggregated "best" set. I'm not sure exactly how to isolate the predicted probabilities (either through PROC LOGISTIC or PROC MIANALYZE) for that aggregated "best" set. I tried adding a "predicted=Fit" statement beside the (ods output ParameterEstimates=Lgsparms) line in PROC LOGISTIC but it gave me an error instead.

 

I understand that if I have an output that just has the predicted probabilities for that final model, I can use the SGPLOT (Or EFFECTPLOTS??) function to visualize those results. Not sure if I'm missing an option or a data step to get to that point...

Rick_SAS
SAS Super FREQ

It sounds like you have three problems:

1. You need to evaluate the final model, which is defined by the parameter estimates table.

The logistic model is 

eta = Intercept + b1*spline1 + b2*spline2 + ... + b8*product

prob = logistic(eta)

2. Because your model is defined in terms of splines, you should output the design matrix, which will contain the spline1-spline3 variables.

3. You need to decide how to visualize a surface in 8 variables. The typical way is to create a slice plot in which you fix a value for the age weight eth dev and product variables (often a mean or median value) and then plot the predicted probability versus the serum variable. The general idea of a slice plot is explained in this blog post: https://blogs.sas.com/content/iml/2017/12/18/visualize-multivariate-regression-models-by-slicing-con...

I think you might need to create the plot manually.

 

There are a lot of technical details involved in doing this so I recommend asking if you really need to do this visualization. It might be sufficient to plot the predicted probability for each subject in the data, which is much easier.

asgee
Obsidian | Level 7
Hi @Rick_SAS , thanks for your reply. The steps you've outlined makes sense - I think I'll just continue exploring other options for now and follow-up with your suggestion of plotting the pred probabilities for each subject in the data. I'll have a look at the design matrix link you've sent, seems like that's a good first step for now.

Thanks!
Saifulinfs
Fluorite | Level 6

hello

I am following this discussion as its helpful. Did you figure out how to get the predicted probabilities for the final model?

Season
Lapis Lazuli | Level 10

In addition to plotting, there is another subtlety I am interested in. That concerns the generation of spline effects themselves. Based on your code, restricted cubic splines are created after the imputation is done. I wonder if you included spline effects when you are imputing. If not, I am also concerned upon how this may have an impact upon your analytical results, as the majority of research paperes (e.g., Multiple imputation of missing data under missing at random: compatible imputation models are not su...) have shown that omission of nonlinear and interaction terms in the imputation stage and simply creat them on-the-spot in the analytical stage will cause biased-toward-zero results (i.e., the regression coefficients are biased toward zero). But, there are also research papers showing that creating them in an ad hoc manner may be a better choice in certain circumstances (e.g., Navigating choices when applying multiple imputation in the presence of multi-level categorical inte...). I personally have not yet retrieved any research paper specifically showing how spline terms, as a special and rather sophisticated kind of nonlinear term, should be dealt with in missing data. After all, creating them on-the-spot is rather easy and certain researchers (me included) are prone to adopt it had it been proved to be a viable approach.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 3961 views
  • 4 likes
  • 4 in conversation