DATA Step, Macro, Functions and more

corrlation on multiple variables

Accepted Solution Solved
Reply
Contributor
Posts: 25
Accepted Solution

corrlation on multiple variables

Hi All,

I need to do a correlation analysis on 10,000 variables  named cerp, cerl, cfty...... and so on and 8 variables red, blue, green..... and so on. Instead of calling for all 10000 variables is there a way we can do the correlation on 10000 variables without listing. Thanks for the help in advance.

Esita

%macro corr_data (var_1, var_2);

proc corr data= cell;

var &var_1 &var_2;

run;

%mend corr_data;

%corr_data (cerp, red);

%corr_data (cerl, blue);

%corr_data (cfty, green);

%corr_data (cerp, red);

%corr_data (cerl, blue);

%corr_data (cfty, green);


Accepted Solutions
Solution
‎10-02-2014 11:20 AM
Trusted Advisor
Posts: 1,933

Re: corrlation on multiple variables

Paige is extremely skeptical that performing correlations on 10,000 variables is a good idea from a statistical point of view. In fact, you will likely get many "significant" correlations just by random chance, rather than because there is a real association between the variables. You will also be misled by the multi-collinearity between variables, and well I can't see how this will lead to any relevant conclusions here.

Nevertheless, if you want to do this in SAS, you can create macro variables containing the names of SAS variables of interest from the contents of your data set.

Something like (untested code)

proc contents data=your_sas_data_set noprint out=_cont_;

run;

proc sql noprint;

     select distinct name into :names separated by ' ' from _cont_ where name ^in ('red','green','blue');

     select distinct name into :names2 separated by ' ' from _cont_ where name in ('red','green','blue');

quit;

ods output outpearson=corrs;

proc corr data=your_sas_data_set noprint;

     var &names;

     with &names2;

run;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,231

Re: corrlation on multiple variables

If your variables starting with 'c' then  you can try something like this

proc corr data=cell;

var red green blue;

with c:;

run;

Contributor
Posts: 25

Re: corrlation on multiple variables

No it starts with different letters

Trusted Advisor
Posts: 1,231

Re: corrlation on multiple variables

Then you can use macro variables as recommended by .

Super User
Posts: 19,872

Re: corrlation on multiple variables

If its all numerical variables you can use the _numerical_ or _num_ shortcut. 

Though you'd better be saving that output to a dataset to automatically go through other wise you'll simply miss correlations. 

Solution
‎10-02-2014 11:20 AM
Trusted Advisor
Posts: 1,933

Re: corrlation on multiple variables

Paige is extremely skeptical that performing correlations on 10,000 variables is a good idea from a statistical point of view. In fact, you will likely get many "significant" correlations just by random chance, rather than because there is a real association between the variables. You will also be misled by the multi-collinearity between variables, and well I can't see how this will lead to any relevant conclusions here.

Nevertheless, if you want to do this in SAS, you can create macro variables containing the names of SAS variables of interest from the contents of your data set.

Something like (untested code)

proc contents data=your_sas_data_set noprint out=_cont_;

run;

proc sql noprint;

     select distinct name into :names separated by ' ' from _cont_ where name ^in ('red','green','blue');

     select distinct name into :names2 separated by ' ' from _cont_ where name in ('red','green','blue');

quit;

ods output outpearson=corrs;

proc corr data=your_sas_data_set noprint;

     var &names;

     with &names2;

run;

Super User
Posts: 11,343

Re: corrlation on multiple variables

It sounds like you don't want printed output i.e. "without listing".

proc corr data=cell outp= pcorr noprint;

with red blue green  ; /* your list of 8 goes here*/

run;

will send Pearson correlations to the set pcorr and generate no listing out put. You can specify other output data sets for Spearman, Hoeffding and Kendal statistics. The output will have correlations for all of the numeric variables compared with the ones on the WITH statement.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 396 views
  • 0 likes
  • 5 in conversation