BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Are there any differences between how proc corr computes a correlation matrix and the way that it is computed for other procs that rely on it?

For example, proc princomp is a statistical procedure that relies on the correlation matrix of a set of data. When I run proc princomp on my data, it returns with the correlation matrix and eigen-vectors relatively quickly. When I run proc corr on the same set of data, it will take over a day to finish.

Can anyone enlighten me on why that is?
3 REPLIES 3
Doc_Duke
Rhodochrosite | Level 12
PRINCOMP uses the Pearson correlation and that should be computed the same; it is very easy (and quick) to compute (only the SAS staff can tell you if it uses the exact same routines). If you use CORR with the defaults, you also get a Spearman correlation, which is rank based. If your dataset is large, this can take a long time to compute (if it fails to build the rank matrix in memory, it uses disk).

Look at the section on computational details for CORR. It says, "... If M bytes are not available, PROC CORR must process the data multiple times to compute all the statistics."

Doc Muhlbaier
Duke
deleted_user
Not applicable
I am working with a large dataset, but I am feeding the same dataset to both procs. Proc princomp can return with results in minutes. Proc corr can take several hours.

I understand that both procedures are attempting to build the pearson correlation, just confused as to why one is so much faster than the other, all else being equal. (ie same datasets, same statistic being computed)

If anything princomp should take longer to run since it requires more computations afterwards, including diagonalizing the matrix which isn't trivial. I also notice the same thing with other procs as well, such as proc varclus which in theory is based off the correlation matrix as well having much shorter run times than proc corr.

It isn't a big issue, anytime I want a correlation matrix, I just use princomp to get it for me. Just curious. Message was edited by: jwu1234
Rick_SAS
SAS Super FREQ
By default, PROC CORR uses pairwise deletion when observations contain missing values. PROC CORR includes all nonmissing pairs of values for each pair of variables in the statistical computations. Therefore, the correlation statistics might be based on different numbers of observations and the PROC needs to examine p(p-1)/2 pairs of variables.

If you specify the NOMISS option, PROC CORR uses listwise deletion. Listwise deletion is what PRINCOMP and other STAT procs use. It is faster because you can delete any observation that contains a missing value and you never have to deal with that observation again.

So if you like the PRINCOMP way, you can use PROC CORR with the NOMISS option.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 855 views
  • 0 likes
  • 3 in conversation