turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Proc Corr

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-26-2009 03:01 PM

Are there any differences between how proc corr computes a correlation matrix and the way that it is computed for other procs that rely on it?

For example, proc princomp is a statistical procedure that relies on the correlation matrix of a set of data. When I run proc princomp on my data, it returns with the correlation matrix and eigen-vectors relatively quickly. When I run proc corr on the same set of data, it will take over a day to finish.

Can anyone enlighten me on why that is?

For example, proc princomp is a statistical procedure that relies on the correlation matrix of a set of data. When I run proc princomp on my data, it returns with the correlation matrix and eigen-vectors relatively quickly. When I run proc corr on the same set of data, it will take over a day to finish.

Can anyone enlighten me on why that is?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

01-27-2009 09:12 AM

PRINCOMP uses the Pearson correlation and that should be computed the same; it is very easy (and quick) to compute (only the SAS staff can tell you if it uses the exact same routines). If you use CORR with the defaults, you also get a Spearman correlation, which is rank based. If your dataset is large, this can take a long time to compute (if it fails to build the rank matrix in memory, it uses disk).

Look at the section on computational details for CORR. It says, "... If M bytes are not available, PROC CORR must process the data multiple times to compute all the statistics."

Doc Muhlbaier

Duke

Look at the section on computational details for CORR. It says, "... If M bytes are not available, PROC CORR must process the data multiple times to compute all the statistics."

Doc Muhlbaier

Duke

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Doc_Duke

01-30-2009 01:45 PM

I am working with a large dataset, but I am feeding the same dataset to both procs. Proc princomp can return with results in minutes. Proc corr can take several hours.

I understand that both procedures are attempting to build the pearson correlation, just confused as to why one is so much faster than the other, all else being equal. (ie same datasets, same statistic being computed)

If anything princomp should take longer to run since it requires more computations afterwards, including diagonalizing the matrix which isn't trivial. I also notice the same thing with other procs as well, such as proc varclus which in theory is based off the correlation matrix as well having much shorter run times than proc corr.

It isn't a big issue, anytime I want a correlation matrix, I just use princomp to get it for me. Just curious. Message was edited by: jwu1234

I understand that both procedures are attempting to build the pearson correlation, just confused as to why one is so much faster than the other, all else being equal. (ie same datasets, same statistic being computed)

If anything princomp should take longer to run since it requires more computations afterwards, including diagonalizing the matrix which isn't trivial. I also notice the same thing with other procs as well, such as proc varclus which in theory is based off the correlation matrix as well having much shorter run times than proc corr.

It isn't a big issue, anytime I want a correlation matrix, I just use princomp to get it for me. Just curious. Message was edited by: jwu1234

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

05-29-2009 03:35 PM

By default, PROC CORR uses pairwise deletion when observations contain missing values. PROC CORR includes all nonmissing pairs of values for each pair of variables in the statistical computations. Therefore, the correlation statistics might be based on different numbers of observations and the PROC needs to examine p(p-1)/2 pairs of variables.

If you specify the NOMISS option, PROC CORR uses listwise deletion. Listwise deletion is what PRINCOMP and other STAT procs use. It is faster because you can delete any observation that contains a missing value and you never have to deal with that observation again.

So if you like the PRINCOMP way, you can use PROC CORR with the NOMISS option.

If you specify the NOMISS option, PROC CORR uses listwise deletion. Listwise deletion is what PRINCOMP and other STAT procs use. It is faster because you can delete any observation that contains a missing value and you never have to deal with that observation again.

So if you like the PRINCOMP way, you can use PROC CORR with the NOMISS option.