turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- proc corr

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-10-2017 03:34 PM

I'm doing multiple regression, but first I need to check if there is correlation between my independent variables. I have a lot of independent variables and my independent variables have value of either 1 or 0. I used proc corr to generate correlation table.

**proc** **corr** data=For_Reg;

var ADDITIONAL_VESSEL NSTEMI NUCLEAR_STRESS_TEST OTHER STABLE_ANGINA STAGED_INTERVENTIONS STEMI TREADMILL_STRESS_TEST

UNSTABLE_ANGINA;

**run**;

and get this output:

Pearson Correlation Coefficients, N = 644 | |||||||||

Prob > |r| under H0: Rho=0 | |||||||||

| ADDITIONAL_VESSEL | NSTEMI | NUCLEAR_STRESS_TEST | OTHER | STABLE_ANGINA | STAGED_INTERVENTIONS | STEMI | TREADMILL_STRESS_TEST | UNSTABLE_ANGINA |

ADDITIONAL_VESSEL | 1 | -0.04986 | 0.02109 | 0.04884 | -0.0228 | 0.87831 | -0.06034 | 0.02035 | 0.06797 |

0.2063 | 0.5933 | 0.2158 | 0.5636 | <.0001 | 0.1261 | 0.6063 | 0.0848 | ||

NSTEMI | -0.04986 | 1 | -0.11717 | -0.17803 | -0.18986 | -0.04254 | -0.18715 | -0.14088 | -0.23739 |

0.2063 | 0.0029 | <.0001 | <.0001 | 0.281 | <.0001 | 0.0003 | <.0001 | ||

NUCLEAR_STRESS_TEST | 0.02109 | -0.11717 | 1 | 0.09684 | 0.20451 | -0.00098 | -0.1935 | 0.88221 | -0.00941 |

0.5933 | 0.0029 | 0.0139 | <.0001 | 0.9801 | <.0001 | <.0001 | 0.8115 | ||

OTHER | 0.04884 | -0.17803 | 0.09684 | 1 | -0.24174 | 0.03422 | -0.23828 | 0.12936 | -0.30225 |

0.2158 | <.0001 | 0.0139 | <.0001 | 0.3859 | <.0001 | 0.001 | <.0001 | ||

STABLE_ANGINA | -0.0228 | -0.18986 | 0.20451 | -0.24174 | 1 | -0.05808 | -0.25412 | 0.20464 | -0.32235 |

0.5636 | <.0001 | <.0001 | <.0001 | 0.141 | <.0001 | <.0001 | <.0001 | ||

STAGED_INTERVENTIONS | 0.87831 | -0.04254 | -0.00098 | 0.03422 | -0.05808 | 1 | 0.03174 | -0.01883 | 0.02523 |

<.0001 | 0.281 | 0.9801 | 0.3859 | 0.141 | 0.4214 | 0.6333 | 0.5227 | ||

STEMI | -0.06034 | -0.18715 | -0.1935 | -0.23828 | -0.25412 | 0.03174 | 1 | -0.20337 | -0.31774 |

0.1261 | <.0001 | <.0001 | <.0001 | <.0001 | 0.4214 | <.0001 | <.0001 | ||

TREADMILL_STRESS_TEST | 0.02035 | -0.14088 | 0.88221 | 0.12936 | 0.20464 | -0.01883 | -0.20337 | 1 | -0.0115 |

0.6063 | 0.0003 | <.0001 | 0.001 | <.0001 | 0.6333 | <.0001 | 0.7709 | ||

UNSTABLE_ANGINA | 0.06797 | -0.23739 | -0.00941 | -0.30225 | -0.32235 | 0.02523 | -0.31774 | -0.0115 | 1 |

0.0848 | <.0001 | 0.8115 | <.0001 | <.0001 | 0.5227 | <.0001 | 0.7709 |

Why did I get two lines for each variable? Is it something to do with my data?

**proc** **princomp** data=For_Reg;

var ADDITIONAL_VESSEL NSTEMI NUCLEAR_STRESS_TEST OTHER STABLE_ANGINA STAGED_INTERVENTIONS STEMI TREADMILL_STRESS_TEST

UNSTABLE_ANGINA;

**run**;

This code got a different correlation tables which makes more sense. which one is a right procedure for my case?

Correlation Matrix | |||||||||

| ADDITIONAL_VESSEL | NSTEMI | NUCLEAR_STRESS_TEST | OTHER | STABLE_ANGINA | STAGED_INTERVENTIONS | STEMI | TREADMILL_STRESS_TEST | UNSTABLE_ANGINA |

ADDITIONAL_VESSEL | 1 | -0.0499 | 0.0211 | 0.0488 | -0.0228 | 0.8783 | -0.0603 | 0.0203 | 0.068 |

NSTEMI | -0.0499 | 1 | -0.1172 | -0.178 | -0.1899 | -0.0425 | -0.1871 | -0.1409 | -0.2374 |

NUCLEAR_STRESS_TEST | 0.0211 | -0.1172 | 1 | 0.0968 | 0.2045 | -0.001 | -0.1935 | 0.8822 | -0.0094 |

OTHER | 0.0488 | -0.178 | 0.0968 | 1 | -0.2417 | 0.0342 | -0.2383 | 0.1294 | -0.3023 |

STABLE_ANGINA | -0.0228 | -0.1899 | 0.2045 | -0.2417 | 1 | -0.0581 | -0.2541 | 0.2046 | -0.3224 |

STAGED_INTERVENTIONS | 0.8783 | -0.0425 | -0.001 | 0.0342 | -0.0581 | 1 | 0.0317 | -0.0188 | 0.0252 |

STEMI | -0.0603 | -0.1871 | -0.1935 | -0.2383 | -0.2541 | 0.0317 | 1 | -0.2034 | -0.3177 |

TREADMILL_STRESS_TEST | 0.0203 | -0.1409 | 0.8822 | 0.1294 | 0.2046 | -0.0188 | -0.2034 | 1 | -0.0115 |

UNSTABLE_ANGINA | 0.068 | -0.2374 | -0.0094 | -0.3023 | -0.3224 | 0.0252 | -0.3177 | -0.0115 | 1 |

Accepted Solutions

Solution

05-10-2017
03:48 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-10-2017 03:45 PM

First be aware that posting many forms of information into the main window on this message board reformats stuff, often removing leading blanks. So the original tables have the variable name spanning to two rows but pasted here the spanning was undone and the second "row" were all shifted to the left.

The second row are p-values for the hypothesis of zero correlation. If you don't want them then use the NOPROB option on the Proc Corr statement.

the main difference between the two is rounding on the calculated coefficients -0.04986 vs -0.0499

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-10-2017 03:41 PM

The second rows are p-values for the null hypothesis that tests whether a correlation coefficient is zero. You can turn off the p-values by using the NOPROB option, like this:

```
proc corr data=sashelp.class noprob;
run;
```

Solution

05-10-2017
03:48 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-10-2017 03:45 PM

First be aware that posting many forms of information into the main window on this message board reformats stuff, often removing leading blanks. So the original tables have the variable name spanning to two rows but pasted here the spanning was undone and the second "row" were all shifted to the left.

The second row are p-values for the hypothesis of zero correlation. If you don't want them then use the NOPROB option on the Proc Corr statement.

the main difference between the two is rounding on the calculated coefficients -0.04986 vs -0.0499

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-11-2017 09:14 AM - edited 05-11-2017 09:15 AM

zhuxiaoyan1 wrote:

I'm doing multiple regression, but first I need to check if there is correlation between my independent variables. I have a lot of independent variables and my independent variables have value of either 1 or 0. I used proc corr to generate correlation table.

Another way to determine the effect of correlation between the independent variables is to use the VIF (variance inflation factor) option in the MODEL statement of PROC REG.

According to the doucmentation:

"The VIF option in the MODEL statement provides the variance inflation factors (VIF). These factors measure the inflation in the variances of the parameter estimates due to collinearities that exist among the regressor (independent) variables. There are no formal criteria for deciding if a VIF is large enough to affect the predicted values."

A good example is: http://documentation.sas.com/?cdcId=statcdc&cdcVersion=14.2&docsetId=statug&docsetTarget=statug_reg_...