turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- corrlation on multiple variables

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2014 11:01 AM

Hi All,

I need to do a correlation analysis on 10,000 variables named cerp, cerl, cfty...... and so on and 8 variables red, blue, green..... and so on. Instead of calling for all 10000 variables is there a way we can do the correlation on 10000 variables without listing. Thanks for the help in advance.

Esita

%macro corr_data (var_1, var_2);

proc corr data= cell;

var &var_1 &var_2;

run;

%mend corr_data;

%corr_data (cerp, red);

%corr_data (cerl, blue);

%corr_data (cfty, green);

%corr_data (cerp, red);

%corr_data (cerl, blue);

%corr_data (cfty, green);

Accepted Solutions

Solution

10-02-2014
11:20 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2014 11:20 AM

Paige is extremely skeptical that performing correlations on 10,000 variables is a good idea from a statistical point of view. In fact, you will likely get many "significant" correlations just by random chance, rather than because there is a real association between the variables. You will also be misled by the multi-collinearity between variables, and well I can't see how this will lead to any relevant conclusions here.

Nevertheless, if you want to do this in SAS, you can create macro variables containing the names of SAS variables of interest from the contents of your data set.

Something like (untested code)

proc contents data=your_sas_data_set noprint out=_cont_;

run;

proc sql noprint;

select distinct name into :names separated by ' ' from _cont_ where name ^in ('red','green','blue');

select distinct name into :names2 separated by ' ' from _cont_ where name in ('red','green','blue');

quit;

ods output outpearson=corrs;

proc corr data=your_sas_data_set noprint;

var &names;

with &names2;

run;

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2014 11:14 AM

If your variables starting with 'c' then you can try something like this

proc corr data=cell;

var red green blue;

with c:;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2014 11:21 AM

No it starts with different letters

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2014 11:33 AM

Then you can use macro variables as recommended by **PaigeMiller**.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2014 11:33 AM

If its all numerical variables you can use the _numerical_ or _num_ shortcut.

Though you'd better be saving that output to a dataset to automatically go through other wise you'll simply miss correlations.

Solution

10-02-2014
11:20 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2014 11:20 AM

Paige is extremely skeptical that performing correlations on 10,000 variables is a good idea from a statistical point of view. In fact, you will likely get many "significant" correlations just by random chance, rather than because there is a real association between the variables. You will also be misled by the multi-collinearity between variables, and well I can't see how this will lead to any relevant conclusions here.

Nevertheless, if you want to do this in SAS, you can create macro variables containing the names of SAS variables of interest from the contents of your data set.

Something like (untested code)

proc contents data=your_sas_data_set noprint out=_cont_;

run;

proc sql noprint;

select distinct name into :names separated by ' ' from _cont_ where name ^in ('red','green','blue');

select distinct name into :names2 separated by ' ' from _cont_ where name in ('red','green','blue');

quit;

ods output outpearson=corrs;

proc corr data=your_sas_data_set noprint;

var &names;

with &names2;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2014 11:20 AM

It sounds like you don't want printed output i.e. "without listing".

proc corr data=cell outp= pcorr noprint;

with red blue green ; /* your list of 8 goes here*/

run;

will send Pearson correlations to the set pcorr and generate no listing out put. You can specify other output data sets for Spearman, Hoeffding and Kendal statistics. The output will have correlations for all of the numeric variables compared with the ones on the WITH statement.