BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
zhuxiaoyan1
Quartz | Level 8

I'm doing multiple regression, but first I need to check if there is correlation between my independent variables. I have a lot of independent variables and my independent variables have value of either 1 or 0. I used proc corr to generate correlation table.

proc corr data=For_Reg;

var ADDITIONAL_VESSEL NSTEMI NUCLEAR_STRESS_TEST OTHER STABLE_ANGINA STAGED_INTERVENTIONS STEMI TREADMILL_STRESS_TEST

UNSTABLE_ANGINA;

run;

 

and get this output:

 

Pearson Correlation Coefficients, N = 644
Prob > |r| under H0: Rho=0
 ADDITIONAL_VESSELNSTEMINUCLEAR_STRESS_TESTOTHERSTABLE_ANGINASTAGED_INTERVENTIONSSTEMITREADMILL_STRESS_TESTUNSTABLE_ANGINA
ADDITIONAL_VESSEL1-0.049860.021090.04884-0.02280.87831-0.060340.020350.06797
 0.20630.59330.21580.5636<.00010.12610.60630.0848
NSTEMI-0.049861-0.11717-0.17803-0.18986-0.04254-0.18715-0.14088-0.23739
0.2063 0.0029<.0001<.00010.281<.00010.0003<.0001
NUCLEAR_STRESS_TEST0.02109-0.1171710.096840.20451-0.00098-0.19350.88221-0.00941
0.59330.0029 0.0139<.00010.9801<.0001<.00010.8115
OTHER0.04884-0.178030.096841-0.241740.03422-0.238280.12936-0.30225
0.2158<.00010.0139 <.00010.3859<.00010.001<.0001
STABLE_ANGINA-0.0228-0.189860.20451-0.241741-0.05808-0.254120.20464-0.32235
0.5636<.0001<.0001<.0001 0.141<.0001<.0001<.0001
STAGED_INTERVENTIONS0.87831-0.04254-0.000980.03422-0.0580810.03174-0.018830.02523
<.00010.2810.98010.38590.141 0.42140.63330.5227
STEMI-0.06034-0.18715-0.1935-0.23828-0.254120.031741-0.20337-0.31774
0.1261<.0001<.0001<.0001<.00010.4214 <.0001<.0001
TREADMILL_STRESS_TEST0.02035-0.140880.882210.129360.20464-0.01883-0.203371-0.0115
0.60630.0003<.00010.001<.00010.6333<.0001 0.7709
UNSTABLE_ANGINA0.06797-0.23739-0.00941-0.30225-0.322350.02523-0.31774-0.01151
0.0848<.00010.8115<.0001<.00010.5227<.00010.7709 

 Why did I get two lines for each variable? Is it something to do with my data?

 

proc princomp data=For_Reg;

var ADDITIONAL_VESSEL NSTEMI NUCLEAR_STRESS_TEST OTHER STABLE_ANGINA STAGED_INTERVENTIONS STEMI TREADMILL_STRESS_TEST

UNSTABLE_ANGINA;

run;

This code got a different correlation tables which makes more sense. which one is a right procedure for my case?

 

Correlation Matrix
 ADDITIONAL_VESSELNSTEMINUCLEAR_STRESS_TESTOTHERSTABLE_ANGINASTAGED_INTERVENTIONSSTEMITREADMILL_STRESS_TESTUNSTABLE_ANGINA
ADDITIONAL_VESSEL1-0.04990.02110.0488-0.02280.8783-0.06030.02030.068
NSTEMI-0.04991-0.1172-0.178-0.1899-0.0425-0.1871-0.1409-0.2374
NUCLEAR_STRESS_TEST0.0211-0.117210.09680.2045-0.001-0.19350.8822-0.0094
OTHER0.0488-0.1780.09681-0.24170.0342-0.23830.1294-0.3023
STABLE_ANGINA-0.0228-0.18990.2045-0.24171-0.0581-0.25410.2046-0.3224
STAGED_INTERVENTIONS0.8783-0.0425-0.0010.0342-0.058110.0317-0.01880.0252
STEMI-0.0603-0.1871-0.1935-0.2383-0.25410.03171-0.2034-0.3177
TREADMILL_STRESS_TEST0.0203-0.14090.88220.12940.2046-0.0188-0.20341-0.0115
UNSTABLE_ANGINA0.068-0.2374-0.0094-0.3023-0.32240.0252-0.3177-0.01151

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

First be aware that posting many forms of information into the main window on this message board reformats stuff, often removing leading blanks. So the original tables have the variable name spanning to two rows but pasted here the spanning was undone and the second "row" were all shifted to the left. 

The second row are p-values for the hypothesis of zero correlation. If you don't want them then use the NOPROB option on the Proc Corr statement.

 

the main difference between the two is rounding on the calculated coefficients   -0.04986 vs -0.0499

View solution in original post

3 REPLIES 3
Rick_SAS
SAS Super FREQ

The second rows are p-values for the null hypothesis that tests whether a correlation coefficient is zero. You can turn off the p-values by using the NOPROB option, like this:

proc corr data=sashelp.class noprob;
run;

 

ballardw
Super User

First be aware that posting many forms of information into the main window on this message board reformats stuff, often removing leading blanks. So the original tables have the variable name spanning to two rows but pasted here the spanning was undone and the second "row" were all shifted to the left. 

The second row are p-values for the hypothesis of zero correlation. If you don't want them then use the NOPROB option on the Proc Corr statement.

 

the main difference between the two is rounding on the calculated coefficients   -0.04986 vs -0.0499

PaigeMiller
Diamond | Level 26

@zhuxiaoyan1 wrote:

I'm doing multiple regression, but first I need to check if there is correlation between my independent variables. I have a lot of independent variables and my independent variables have value of either 1 or 0. I used proc corr to generate correlation table.

 


Another way to determine the effect of correlation between the independent variables is to use the VIF (variance inflation factor) option in the MODEL statement of PROC REG.

 

According to the doucmentation:

 

"The VIF option in the MODEL statement provides the variance inflation factors (VIF). These factors measure the inflation in the variances of the parameter estimates due to collinearities that exist among the regressor (independent) variables. There are no formal criteria for deciding if a VIF is large enough to affect the predicted values."

 

A good example is: http://documentation.sas.com/?cdcId=statcdc&cdcVersion=14.2&docsetId=statug&docsetTarget=statug_reg_...

--
Paige Miller

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1670 views
  • 3 likes
  • 4 in conversation