Hi All,
I need to do a correlation analysis on 10,000 variables named cerp, cerl, cfty...... and so on and 8 variables red, blue, green..... and so on. Instead of calling for all 10000 variables is there a way we can do the correlation on 10000 variables without listing. Thanks for the help in advance.
Esita
%macro corr_data (var_1, var_2);
proc corr data= cell;
var &var_1 &var_2;
run;
%mend corr_data;
%corr_data (cerp, red);
%corr_data (cerl, blue);
%corr_data (cfty, green);
%corr_data (cerp, red);
%corr_data (cerl, blue);
%corr_data (cfty, green);
Paige is extremely skeptical that performing correlations on 10,000 variables is a good idea from a statistical point of view. In fact, you will likely get many "significant" correlations just by random chance, rather than because there is a real association between the variables. You will also be misled by the multi-collinearity between variables, and well I can't see how this will lead to any relevant conclusions here.
Nevertheless, if you want to do this in SAS, you can create macro variables containing the names of SAS variables of interest from the contents of your data set.
Something like (untested code)
proc contents data=your_sas_data_set noprint out=_cont_;
run;
proc sql noprint;
select distinct name into :names separated by ' ' from _cont_ where name ^in ('red','green','blue');
select distinct name into :names2 separated by ' ' from _cont_ where name in ('red','green','blue');
quit;
ods output outpearson=corrs;
proc corr data=your_sas_data_set noprint;
var &names;
with &names2;
run;
If your variables starting with 'c' then you can try something like this
proc corr data=cell;
var red green blue;
with c:;
run;
No it starts with different letters
Then you can use macro variables as recommended by PaigeMiller.
If its all numerical variables you can use the _numerical_ or _num_ shortcut.
Though you'd better be saving that output to a dataset to automatically go through other wise you'll simply miss correlations.
Paige is extremely skeptical that performing correlations on 10,000 variables is a good idea from a statistical point of view. In fact, you will likely get many "significant" correlations just by random chance, rather than because there is a real association between the variables. You will also be misled by the multi-collinearity between variables, and well I can't see how this will lead to any relevant conclusions here.
Nevertheless, if you want to do this in SAS, you can create macro variables containing the names of SAS variables of interest from the contents of your data set.
Something like (untested code)
proc contents data=your_sas_data_set noprint out=_cont_;
run;
proc sql noprint;
select distinct name into :names separated by ' ' from _cont_ where name ^in ('red','green','blue');
select distinct name into :names2 separated by ' ' from _cont_ where name in ('red','green','blue');
quit;
ods output outpearson=corrs;
proc corr data=your_sas_data_set noprint;
var &names;
with &names2;
run;
It sounds like you don't want printed output i.e. "without listing".
proc corr data=cell outp= pcorr noprint;
with red blue green ; /* your list of 8 goes here*/
run;
will send Pearson correlations to the set pcorr and generate no listing out put. You can specify other output data sets for Spearman, Hoeffding and Kendal statistics. The output will have correlations for all of the numeric variables compared with the ones on the WITH statement.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.