- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I need to do a correlation analysis on 10,000 variables named cerp, cerl, cfty...... and so on and 8 variables red, blue, green..... and so on. Instead of calling for all 10000 variables is there a way we can do the correlation on 10000 variables without listing. Thanks for the help in advance.
Esita
%macro corr_data (var_1, var_2);
proc corr data= cell;
var &var_1 &var_2;
run;
%mend corr_data;
%corr_data (cerp, red);
%corr_data (cerl, blue);
%corr_data (cfty, green);
%corr_data (cerp, red);
%corr_data (cerl, blue);
%corr_data (cfty, green);
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Paige is extremely skeptical that performing correlations on 10,000 variables is a good idea from a statistical point of view. In fact, you will likely get many "significant" correlations just by random chance, rather than because there is a real association between the variables. You will also be misled by the multi-collinearity between variables, and well I can't see how this will lead to any relevant conclusions here.
Nevertheless, if you want to do this in SAS, you can create macro variables containing the names of SAS variables of interest from the contents of your data set.
Something like (untested code)
proc contents data=your_sas_data_set noprint out=_cont_;
run;
proc sql noprint;
select distinct name into :names separated by ' ' from _cont_ where name ^in ('red','green','blue');
select distinct name into :names2 separated by ' ' from _cont_ where name in ('red','green','blue');
quit;
ods output outpearson=corrs;
proc corr data=your_sas_data_set noprint;
var &names;
with &names2;
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If your variables starting with 'c' then you can try something like this
proc corr data=cell;
var red green blue;
with c:;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
No it starts with different letters
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Then you can use macro variables as recommended by PaigeMiller.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If its all numerical variables you can use the _numerical_ or _num_ shortcut.
Though you'd better be saving that output to a dataset to automatically go through other wise you'll simply miss correlations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Paige is extremely skeptical that performing correlations on 10,000 variables is a good idea from a statistical point of view. In fact, you will likely get many "significant" correlations just by random chance, rather than because there is a real association between the variables. You will also be misled by the multi-collinearity between variables, and well I can't see how this will lead to any relevant conclusions here.
Nevertheless, if you want to do this in SAS, you can create macro variables containing the names of SAS variables of interest from the contents of your data set.
Something like (untested code)
proc contents data=your_sas_data_set noprint out=_cont_;
run;
proc sql noprint;
select distinct name into :names separated by ' ' from _cont_ where name ^in ('red','green','blue');
select distinct name into :names2 separated by ' ' from _cont_ where name in ('red','green','blue');
quit;
ods output outpearson=corrs;
proc corr data=your_sas_data_set noprint;
var &names;
with &names2;
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It sounds like you don't want printed output i.e. "without listing".
proc corr data=cell outp= pcorr noprint;
with red blue green ; /* your list of 8 goes here*/
run;
will send Pearson correlations to the set pcorr and generate no listing out put. You can specify other output data sets for Spearman, Hoeffding and Kendal statistics. The output will have correlations for all of the numeric variables compared with the ones on the WITH statement.