I'm doing multiple regression, but first I need to check if there is correlation between my independent variables. I have a lot of independent variables and my independent variables have value of either 1 or 0. I used proc corr to generate correlation table.
proc corr data=For_Reg;
var ADDITIONAL_VESSEL NSTEMI NUCLEAR_STRESS_TEST OTHER STABLE_ANGINA STAGED_INTERVENTIONS STEMI TREADMILL_STRESS_TEST
UNSTABLE_ANGINA;
run;
and get this output:
Pearson Correlation Coefficients, N = 644 | |||||||||
Prob > |r| under H0: Rho=0 | |||||||||
ADDITIONAL_VESSEL | NSTEMI | NUCLEAR_STRESS_TEST | OTHER | STABLE_ANGINA | STAGED_INTERVENTIONS | STEMI | TREADMILL_STRESS_TEST | UNSTABLE_ANGINA | |
ADDITIONAL_VESSEL | 1 | -0.04986 | 0.02109 | 0.04884 | -0.0228 | 0.87831 | -0.06034 | 0.02035 | 0.06797 |
0.2063 | 0.5933 | 0.2158 | 0.5636 | <.0001 | 0.1261 | 0.6063 | 0.0848 | ||
NSTEMI | -0.04986 | 1 | -0.11717 | -0.17803 | -0.18986 | -0.04254 | -0.18715 | -0.14088 | -0.23739 |
0.2063 | 0.0029 | <.0001 | <.0001 | 0.281 | <.0001 | 0.0003 | <.0001 | ||
NUCLEAR_STRESS_TEST | 0.02109 | -0.11717 | 1 | 0.09684 | 0.20451 | -0.00098 | -0.1935 | 0.88221 | -0.00941 |
0.5933 | 0.0029 | 0.0139 | <.0001 | 0.9801 | <.0001 | <.0001 | 0.8115 | ||
OTHER | 0.04884 | -0.17803 | 0.09684 | 1 | -0.24174 | 0.03422 | -0.23828 | 0.12936 | -0.30225 |
0.2158 | <.0001 | 0.0139 | <.0001 | 0.3859 | <.0001 | 0.001 | <.0001 | ||
STABLE_ANGINA | -0.0228 | -0.18986 | 0.20451 | -0.24174 | 1 | -0.05808 | -0.25412 | 0.20464 | -0.32235 |
0.5636 | <.0001 | <.0001 | <.0001 | 0.141 | <.0001 | <.0001 | <.0001 | ||
STAGED_INTERVENTIONS | 0.87831 | -0.04254 | -0.00098 | 0.03422 | -0.05808 | 1 | 0.03174 | -0.01883 | 0.02523 |
<.0001 | 0.281 | 0.9801 | 0.3859 | 0.141 | 0.4214 | 0.6333 | 0.5227 | ||
STEMI | -0.06034 | -0.18715 | -0.1935 | -0.23828 | -0.25412 | 0.03174 | 1 | -0.20337 | -0.31774 |
0.1261 | <.0001 | <.0001 | <.0001 | <.0001 | 0.4214 | <.0001 | <.0001 | ||
TREADMILL_STRESS_TEST | 0.02035 | -0.14088 | 0.88221 | 0.12936 | 0.20464 | -0.01883 | -0.20337 | 1 | -0.0115 |
0.6063 | 0.0003 | <.0001 | 0.001 | <.0001 | 0.6333 | <.0001 | 0.7709 | ||
UNSTABLE_ANGINA | 0.06797 | -0.23739 | -0.00941 | -0.30225 | -0.32235 | 0.02523 | -0.31774 | -0.0115 | 1 |
0.0848 | <.0001 | 0.8115 | <.0001 | <.0001 | 0.5227 | <.0001 | 0.7709 |
Why did I get two lines for each variable? Is it something to do with my data?
proc princomp data=For_Reg;
var ADDITIONAL_VESSEL NSTEMI NUCLEAR_STRESS_TEST OTHER STABLE_ANGINA STAGED_INTERVENTIONS STEMI TREADMILL_STRESS_TEST
UNSTABLE_ANGINA;
run;
This code got a different correlation tables which makes more sense. which one is a right procedure for my case?
Correlation Matrix | |||||||||
ADDITIONAL_VESSEL | NSTEMI | NUCLEAR_STRESS_TEST | OTHER | STABLE_ANGINA | STAGED_INTERVENTIONS | STEMI | TREADMILL_STRESS_TEST | UNSTABLE_ANGINA | |
ADDITIONAL_VESSEL | 1 | -0.0499 | 0.0211 | 0.0488 | -0.0228 | 0.8783 | -0.0603 | 0.0203 | 0.068 |
NSTEMI | -0.0499 | 1 | -0.1172 | -0.178 | -0.1899 | -0.0425 | -0.1871 | -0.1409 | -0.2374 |
NUCLEAR_STRESS_TEST | 0.0211 | -0.1172 | 1 | 0.0968 | 0.2045 | -0.001 | -0.1935 | 0.8822 | -0.0094 |
OTHER | 0.0488 | -0.178 | 0.0968 | 1 | -0.2417 | 0.0342 | -0.2383 | 0.1294 | -0.3023 |
STABLE_ANGINA | -0.0228 | -0.1899 | 0.2045 | -0.2417 | 1 | -0.0581 | -0.2541 | 0.2046 | -0.3224 |
STAGED_INTERVENTIONS | 0.8783 | -0.0425 | -0.001 | 0.0342 | -0.0581 | 1 | 0.0317 | -0.0188 | 0.0252 |
STEMI | -0.0603 | -0.1871 | -0.1935 | -0.2383 | -0.2541 | 0.0317 | 1 | -0.2034 | -0.3177 |
TREADMILL_STRESS_TEST | 0.0203 | -0.1409 | 0.8822 | 0.1294 | 0.2046 | -0.0188 | -0.2034 | 1 | -0.0115 |
UNSTABLE_ANGINA | 0.068 | -0.2374 | -0.0094 | -0.3023 | -0.3224 | 0.0252 | -0.3177 | -0.0115 | 1 |
First be aware that posting many forms of information into the main window on this message board reformats stuff, often removing leading blanks. So the original tables have the variable name spanning to two rows but pasted here the spanning was undone and the second "row" were all shifted to the left.
The second row are p-values for the hypothesis of zero correlation. If you don't want them then use the NOPROB option on the Proc Corr statement.
the main difference between the two is rounding on the calculated coefficients -0.04986 vs -0.0499
The second rows are p-values for the null hypothesis that tests whether a correlation coefficient is zero. You can turn off the p-values by using the NOPROB option, like this:
proc corr data=sashelp.class noprob;
run;
First be aware that posting many forms of information into the main window on this message board reformats stuff, often removing leading blanks. So the original tables have the variable name spanning to two rows but pasted here the spanning was undone and the second "row" were all shifted to the left.
The second row are p-values for the hypothesis of zero correlation. If you don't want them then use the NOPROB option on the Proc Corr statement.
the main difference between the two is rounding on the calculated coefficients -0.04986 vs -0.0499
@zhuxiaoyan1 wrote:
I'm doing multiple regression, but first I need to check if there is correlation between my independent variables. I have a lot of independent variables and my independent variables have value of either 1 or 0. I used proc corr to generate correlation table.
Another way to determine the effect of correlation between the independent variables is to use the VIF (variance inflation factor) option in the MODEL statement of PROC REG.
According to the doucmentation:
"The VIF option in the MODEL statement provides the variance inflation factors (VIF). These factors measure the inflation in the variances of the parameter estimates due to collinearities that exist among the regressor (independent) variables. There are no formal criteria for deciding if a VIF is large enough to affect the predicted values."
A good example is: http://documentation.sas.com/?cdcId=statcdc&cdcVersion=14.2&docsetId=statug&docsetTarget=statug_reg_...
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.