Solved: Re: proc corr

zhuxiaoyan1 · Posted 05-10-2017 03:34 PM

I'm doing multiple regression, but first I need to check if there is correlation between my independent variables. I have a lot of independent variables and my independent variables have value of either 1 or 0. I used proc corr to generate correlation table.

proc corr data=For_Reg;

var ADDITIONAL_VESSEL NSTEMI NUCLEAR_STRESS_TEST OTHER STABLE_ANGINA STAGED_INTERVENTIONS STEMI TREADMILL_STRESS_TEST

UNSTABLE_ANGINA;

run;

and get this output:

Pearson Correlation Coefficients, N = 644
Prob > \|r\| under H0: Rho=0
	ADDITIONAL_VESSEL	NSTEMI	NUCLEAR_STRESS_TEST	OTHER	STABLE_ANGINA	STAGED_INTERVENTIONS	STEMI	TREADMILL_STRESS_TEST	UNSTABLE_ANGINA
ADDITIONAL_VESSEL	1	-0.04986	0.02109	0.04884	-0.0228	0.87831	-0.06034	0.02035	0.06797
	0.2063	0.5933	0.2158	0.5636	<.0001	0.1261	0.6063	0.0848
NSTEMI	-0.04986	1	-0.11717	-0.17803	-0.18986	-0.04254	-0.18715	-0.14088	-0.23739
0.2063		0.0029	<.0001	<.0001	0.281	<.0001	0.0003	<.0001
NUCLEAR_STRESS_TEST	0.02109	-0.11717	1	0.09684	0.20451	-0.00098	-0.1935	0.88221	-0.00941
0.5933	0.0029		0.0139	<.0001	0.9801	<.0001	<.0001	0.8115
OTHER	0.04884	-0.17803	0.09684	1	-0.24174	0.03422	-0.23828	0.12936	-0.30225
0.2158	<.0001	0.0139		<.0001	0.3859	<.0001	0.001	<.0001
STABLE_ANGINA	-0.0228	-0.18986	0.20451	-0.24174	1	-0.05808	-0.25412	0.20464	-0.32235
0.5636	<.0001	<.0001	<.0001		0.141	<.0001	<.0001	<.0001
STAGED_INTERVENTIONS	0.87831	-0.04254	-0.00098	0.03422	-0.05808	1	0.03174	-0.01883	0.02523
<.0001	0.281	0.9801	0.3859	0.141		0.4214	0.6333	0.5227
STEMI	-0.06034	-0.18715	-0.1935	-0.23828	-0.25412	0.03174	1	-0.20337	-0.31774
0.1261	<.0001	<.0001	<.0001	<.0001	0.4214		<.0001	<.0001
TREADMILL_STRESS_TEST	0.02035	-0.14088	0.88221	0.12936	0.20464	-0.01883	-0.20337	1	-0.0115
0.6063	0.0003	<.0001	0.001	<.0001	0.6333	<.0001		0.7709
UNSTABLE_ANGINA	0.06797	-0.23739	-0.00941	-0.30225	-0.32235	0.02523	-0.31774	-0.0115	1
0.0848	<.0001	0.8115	<.0001	<.0001	0.5227	<.0001	0.7709

Why did I get two lines for each variable? Is it something to do with my data?

proc princomp data=For_Reg;

var ADDITIONAL_VESSEL NSTEMI NUCLEAR_STRESS_TEST OTHER STABLE_ANGINA STAGED_INTERVENTIONS STEMI TREADMILL_STRESS_TEST

UNSTABLE_ANGINA;

run;

This code got a different correlation tables which makes more sense. which one is a right procedure for my case?

Correlation Matrix
	ADDITIONAL_VESSEL	NSTEMI	NUCLEAR_STRESS_TEST	OTHER	STABLE_ANGINA	STAGED_INTERVENTIONS	STEMI	TREADMILL_STRESS_TEST	UNSTABLE_ANGINA
ADDITIONAL_VESSEL	1	-0.0499	0.0211	0.0488	-0.0228	0.8783	-0.0603	0.0203	0.068
NSTEMI	-0.0499	1	-0.1172	-0.178	-0.1899	-0.0425	-0.1871	-0.1409	-0.2374
NUCLEAR_STRESS_TEST	0.0211	-0.1172	1	0.0968	0.2045	-0.001	-0.1935	0.8822	-0.0094
OTHER	0.0488	-0.178	0.0968	1	-0.2417	0.0342	-0.2383	0.1294	-0.3023
STABLE_ANGINA	-0.0228	-0.1899	0.2045	-0.2417	1	-0.0581	-0.2541	0.2046	-0.3224
STAGED_INTERVENTIONS	0.8783	-0.0425	-0.001	0.0342	-0.0581	1	0.0317	-0.0188	0.0252
STEMI	-0.0603	-0.1871	-0.1935	-0.2383	-0.2541	0.0317	1	-0.2034	-0.3177
TREADMILL_STRESS_TEST	0.0203	-0.1409	0.8822	0.1294	0.2046	-0.0188	-0.2034	1	-0.0115
UNSTABLE_ANGINA	0.068	-0.2374	-0.0094	-0.3023	-0.3224	0.0252	-0.3177	-0.0115	1

ballardw · Posted 05-10-2017 03:45 PM

First be aware that posting many forms of information into the main window on this message board reformats stuff, often removing leading blanks. So the original tables have the variable name spanning to two rows but pasted here the spanning was undone and the second "row" were all shifted to the left.

The second row are p-values for the hypothesis of zero correlation. If you don't want them then use the NOPROB option on the Proc Corr statement.

the main difference between the two is rounding on the calculated coefficients -0.04986 vs -0.0499

View solution in original post

Rick_SAS · Posted 05-10-2017 03:41 PM

The second rows are p-values for the null hypothesis that tests whether a correlation coefficient is zero. You can turn off the p-values by using the NOPROB option, like this:

proc corr data=sashelp.class noprob;
run;

ballardw · Posted 05-10-2017 03:45 PM

First be aware that posting many forms of information into the main window on this message board reformats stuff, often removing leading blanks. So the original tables have the variable name spanning to two rows but pasted here the spanning was undone and the second "row" were all shifted to the left.

The second row are p-values for the hypothesis of zero correlation. If you don't want them then use the NOPROB option on the Proc Corr statement.

the main difference between the two is rounding on the calculated coefficients -0.04986 vs -0.0499

PaigeMiller · Posted 05-11-2017 09:14 AM

@zhuxiaoyan1 wrote:

I'm doing multiple regression, but first I need to check if there is correlation between my independent variables. I have a lot of independent variables and my independent variables have value of either 1 or 0. I used proc corr to generate correlation table.

Another way to determine the effect of correlation between the independent variables is to use the VIF (variance inflation factor) option in the MODEL statement of PROC REG.

According to the doucmentation:

"The VIF option in the MODEL statement provides the variance inflation factors (VIF). These factors measure the inflation in the variances of the parameter estimates due to collinearities that exist among the regressor (independent) variables. There are no formal criteria for deciding if a VIF is large enough to affect the predicted values."

A good example is: http://documentation.sas.com/?cdcId=statcdc&cdcVersion=14.2&docsetId=statug&docsetTarget=statug_reg_...

--
Paige Miller

proc corr

Re: proc corr

Re: proc corr

Re: proc corr

Re: proc corr

Catch up on SAS Innovate 2026