BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sas_epi
Calcite | Level 5

Hello..

 

I'm running multiple multivariable linear regression models (same set of covariates, changing primary predictor) using proc glm and then using the effectplot command in proc plm to plot the models. I would like to also have a single plot with all models overlaid on it. After reading through posts here, I figured that the best option for me was to output the predicted data from within the proc plm command , merge all of them and then try to plot them using sgplot. I'm running my glm on an N of 90 and have 10 variables in the model- 2 are continuous, the rest are categorical. There are no missing data in this file.

Things were moving reasonably well till I looked at the data output from the plm command. The data output from proc plm  'FitPlot' has an N of 200. I recognize the first and last values of my primary predictor X variable. Based on some of the posts here, I think that the procedure is somehow taking a set of values from the X-variable and using them in the model to predict the outcome- Is this the case? If so, does this somehow dictate the increase in N from the original data?

I feel like I understand why the actual x values are not used (but would like to confirm this)- the resulting prediction would a be a 'jittered' scatter of points and would produce a jagged line. Is this so?

 

Thank you!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

When you create an effect plot for a continuous variable, SAS procedures evaluate the regression model on an evenly spaced grid for the range of the X variable (ph1, I guess). By default, I think 201 points are used, but you say 200, so I might be wrong.

 

When you overlay the predicted values, each model (ph1, ph2, etc) will have 200 (or so) points,

 

Why? Because you didn't provide a SCORE data set, so that procedure assume you want to score on the range of the data.

It is not related to avoiding a "jittered scatter of points" or a "jagged line."  

 

View solution in original post

4 REPLIES 4
ballardw
Super User

You may get a better answer if you can show the code for one of the regressions.

 

 

sas_epi
Calcite | Level 5

Thank you for comment, ballardw. The code is below. This works- I get the plot that I want. My confusion is on the difference in the N that the glm model is running (N=90) and the N for the data that plm outputs (n=200)

 


proc glm data=finaldata plots=(diagnostics residuals(smooth));
class year (ref="2011") gender(ref="F") site(ref="DU") bmi_cat (ref="Normal") matrace (ref="NH White") education (ref="Bachelors or higher") income (ref="100- <200 K") parity(ref="1");
model totscore= ph1 site bmi_cat matrace education income matage gender year parity/solution CLPARM;
store ph1pred;
run;
quit;

 

proc plm restore= ph1pred;
effectplot fit(x= ph1) / at(gender="F") at(site="DU") at(matrace="NH White") at(bmi_cat="Normal") at(education="Bachelors or higher") at(income="100- <200 K") at(parity="1") at(year="2011");
ods output FitPlot= ph1pred;
run;


data ph1pred;
set ph1pred (keep= _XCONT1 _PREDICTED);
rename _XCONT1= ph1;
rename _PREDICTED= totscore;
run;

 

....... running the same model 14 times with different 'ph' variables and producing output data, I merged them to get a data set with all predicted data. Then plot as below:

 


proc sgplot data= finalpred;
series x=ph1 y=totscore1 ;
.

.

.
series x=ph14 y=totscore14;

 

yaxis grid values=(1.5 to 4.5 by .5);
xaxis label="ph";
yaxis label="Predicted score";
title "Predicted plot of ph1-p14";
run;

 

 

 

Rick_SAS
SAS Super FREQ

When you create an effect plot for a continuous variable, SAS procedures evaluate the regression model on an evenly spaced grid for the range of the X variable (ph1, I guess). By default, I think 201 points are used, but you say 200, so I might be wrong.

 

When you overlay the predicted values, each model (ph1, ph2, etc) will have 200 (or so) points,

 

Why? Because you didn't provide a SCORE data set, so that procedure assume you want to score on the range of the data.

It is not related to avoiding a "jittered scatter of points" or a "jagged line."  

 

sas_epi
Calcite | Level 5

Thank you Rick, that makes sense to me.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 927 views
  • 0 likes
  • 3 in conversation