Hello,
When we try to use linear regression, we need to do the diagnostics first. So I wonder when I use "proc surveyreg" to investigate the linear regression, should I use "proc reg" statement to do the diagnostics test (residual plot ...) or should I use "proc surveyreg" statement to do the diagnostics? If I should use "proc surveyreg" statement, how can I get the residual plot?
Thank you
Which specific diagnostics are you wanting?
If the data comes from a complex sample and requires Proc Surveyreg then Reg is very likely not the tool to use for diagnostics. Proc Reg would not properly apply the weights from a complex sample and so the residuals would be extremely likely to be incorrect.
Perhaps Surveymeans or Surveyfreq would come into play.
Thank you so much!
I want to check the assumption of linear regression including residual normality, outliers, linear relationship between dependent variable and independent variables, homogeneity, and multicollinearity. could you please tell me how to do those test in proc surveyreg statement?
Thank you!
You can output the residuals from PROC SURVEYREG using the OUTPUT statement, and then you can plot them to take care of "residual normality, outliers, linear relationship between dependent variable and independent variables, homogeneity". I think the COVB option in the MODEL statement would address multicollinearity.
Hello, one more question.
I used following code to try to get the residual:
ods graphics on;
PROC SURVEYREG DATA= nh.outcomes nomcar;
STRATA sdmvstra;
CLUSTER sdmvpsu;
CLASS alpha16 age RIAGENDR PIR SDDSRVYR RIDRETH1;
WEIGHT glucwt4yr;
DOMAIN eligible;
model BMXBMI= age RIAGENDR PIR SDDSRVYR RIDRETH1 EIEER totalcounts alpha16/ stb adjrsq clparm solution vadjust=none COVB ;
lsmeans alpha16/ lines adjust=tukey;
output out=bmi p= predict r = residual ;
run;
quit;
ods graphics off;
But could you please tell me how to plot the residual? I tried proc forecast and proc sgplot statements, but they are not working
proc forecast data=bmi
out=pred outfull outresid;
id seqn;
var age RIAGENDR PIR SDDSRVYR RIDRETH1 EIEER totalcounts alpha16;
run;
proc sgplot data=pred;
where _type_='RESIDUAL';
needle x=seqn y=age RIAGENDR PIR SDDSRVYR RIDRETH1 EIEER totalcounts alpha16 / markers;
run;
@knighsson wrote:
proc sgplot data=pred;
where _type_='RESIDUAL';
needle x=seqn y=age RIAGENDR PIR SDDSRVYR RIDRETH1 EIEER totalcounts alpha16 / markers;
run;
The NEEDLE statement allows only one variable after Y=. But I doubt you really want a NEEDLE here, looking at residuals is usually done via scatter plots, so you can use the SCATTER statement.
Regarding _type_='RESIDUAL', you need to look (with your own eyes) inside the data set that is created named BMI (it is not named PRED) and see how the data set is structured, that will identify if you need a WHERE statement and what the WHERE statement should say; and it will identify the variable names you can use. Essentially, if you look at BMI with your own eyes, you will see everything you need to code some sort of residual plot.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.