Programming the statistical procedures from SAS

proc corr

Accepted Solution Solved
Reply
Contributor
Posts: 63
Accepted Solution

proc corr

I'm doing multiple regression, but first I need to check if there is correlation between my independent variables. I have a lot of independent variables and my independent variables have value of either 1 or 0. I used proc corr to generate correlation table.

proc corr data=For_Reg;

var ADDITIONAL_VESSEL NSTEMI NUCLEAR_STRESS_TEST OTHER STABLE_ANGINA STAGED_INTERVENTIONS STEMI TREADMILL_STRESS_TEST

UNSTABLE_ANGINA;

run;

 

and get this output:

 

Pearson Correlation Coefficients, N = 644
Prob > |r| under H0: Rho=0
 ADDITIONAL_VESSELNSTEMINUCLEAR_STRESS_TESTOTHERSTABLE_ANGINASTAGED_INTERVENTIONSSTEMITREADMILL_STRESS_TESTUNSTABLE_ANGINA
ADDITIONAL_VESSEL1-0.049860.021090.04884-0.02280.87831-0.060340.020350.06797
 0.20630.59330.21580.5636<.00010.12610.60630.0848
NSTEMI-0.049861-0.11717-0.17803-0.18986-0.04254-0.18715-0.14088-0.23739
0.2063 0.0029<.0001<.00010.281<.00010.0003<.0001
NUCLEAR_STRESS_TEST0.02109-0.1171710.096840.20451-0.00098-0.19350.88221-0.00941
0.59330.0029 0.0139<.00010.9801<.0001<.00010.8115
OTHER0.04884-0.178030.096841-0.241740.03422-0.238280.12936-0.30225
0.2158<.00010.0139 <.00010.3859<.00010.001<.0001
STABLE_ANGINA-0.0228-0.189860.20451-0.241741-0.05808-0.254120.20464-0.32235
0.5636<.0001<.0001<.0001 0.141<.0001<.0001<.0001
STAGED_INTERVENTIONS0.87831-0.04254-0.000980.03422-0.0580810.03174-0.018830.02523
<.00010.2810.98010.38590.141 0.42140.63330.5227
STEMI-0.06034-0.18715-0.1935-0.23828-0.254120.031741-0.20337-0.31774
0.1261<.0001<.0001<.0001<.00010.4214 <.0001<.0001
TREADMILL_STRESS_TEST0.02035-0.140880.882210.129360.20464-0.01883-0.203371-0.0115
0.60630.0003<.00010.001<.00010.6333<.0001 0.7709
UNSTABLE_ANGINA0.06797-0.23739-0.00941-0.30225-0.322350.02523-0.31774-0.01151
0.0848<.00010.8115<.0001<.00010.5227<.00010.7709 

 Why did I get two lines for each variable? Is it something to do with my data?

 

proc princomp data=For_Reg;

var ADDITIONAL_VESSEL NSTEMI NUCLEAR_STRESS_TEST OTHER STABLE_ANGINA STAGED_INTERVENTIONS STEMI TREADMILL_STRESS_TEST

UNSTABLE_ANGINA;

run;

This code got a different correlation tables which makes more sense. which one is a right procedure for my case?

 

Correlation Matrix
 ADDITIONAL_VESSELNSTEMINUCLEAR_STRESS_TESTOTHERSTABLE_ANGINASTAGED_INTERVENTIONSSTEMITREADMILL_STRESS_TESTUNSTABLE_ANGINA
ADDITIONAL_VESSEL1-0.04990.02110.0488-0.02280.8783-0.06030.02030.068
NSTEMI-0.04991-0.1172-0.178-0.1899-0.0425-0.1871-0.1409-0.2374
NUCLEAR_STRESS_TEST0.0211-0.117210.09680.2045-0.001-0.19350.8822-0.0094
OTHER0.0488-0.1780.09681-0.24170.0342-0.23830.1294-0.3023
STABLE_ANGINA-0.0228-0.18990.2045-0.24171-0.0581-0.25410.2046-0.3224
STAGED_INTERVENTIONS0.8783-0.0425-0.0010.0342-0.058110.0317-0.01880.0252
STEMI-0.0603-0.1871-0.1935-0.2383-0.25410.03171-0.2034-0.3177
TREADMILL_STRESS_TEST0.0203-0.14090.88220.12940.2046-0.0188-0.20341-0.0115
UNSTABLE_ANGINA0.068-0.2374-0.0094-0.3023-0.32240.0252-0.3177-0.01151

 

 


Accepted Solutions
Solution
‎05-10-2017 03:48 PM
Grand Advisor
Posts: 10,052

Re: proc corr

First be aware that posting many forms of information into the main window on this message board reformats stuff, often removing leading blanks. So the original tables have the variable name spanning to two rows but pasted here the spanning was undone and the second "row" were all shifted to the left. 

The second row are p-values for the hypothesis of zero correlation. If you don't want them then use the NOPROB option on the Proc Corr statement.

 

the main difference between the two is rounding on the calculated coefficients   -0.04986 vs -0.0499

View solution in original post


All Replies
SAS Super FREQ
Posts: 3,310

Re: proc corr

The second rows are p-values for the null hypothesis that tests whether a correlation coefficient is zero. You can turn off the p-values by using the NOPROB option, like this:

proc corr data=sashelp.class noprob;
run;

 

Solution
‎05-10-2017 03:48 PM
Grand Advisor
Posts: 10,052

Re: proc corr

First be aware that posting many forms of information into the main window on this message board reformats stuff, often removing leading blanks. So the original tables have the variable name spanning to two rows but pasted here the spanning was undone and the second "row" were all shifted to the left. 

The second row are p-values for the hypothesis of zero correlation. If you don't want them then use the NOPROB option on the Proc Corr statement.

 

the main difference between the two is rounding on the calculated coefficients   -0.04986 vs -0.0499

Trusted Advisor
Posts: 1,431

Re: proc corr

[ Edited ]

zhuxiaoyan1 wrote:

I'm doing multiple regression, but first I need to check if there is correlation between my independent variables. I have a lot of independent variables and my independent variables have value of either 1 or 0. I used proc corr to generate correlation table.

 


Another way to determine the effect of correlation between the independent variables is to use the VIF (variance inflation factor) option in the MODEL statement of PROC REG.

 

According to the doucmentation:

 

"The VIF option in the MODEL statement provides the variance inflation factors (VIF). These factors measure the inflation in the variances of the parameter estimates due to collinearities that exist among the regressor (independent) variables. There are no formal criteria for deciding if a VIF is large enough to affect the predicted values."

 

A good example is: http://documentation.sas.com/?cdcId=statcdc&cdcVersion=14.2&docsetId=statug&docsetTarget=statug_reg_...

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 88 views
  • 3 likes
  • 4 in conversation