Hi all,
I admit that this question could be already answered before. At least, in separate parts, but.....
I'm a newbie 🙂 and can't join it together.
I need to do a proc corr through all variables from a data set (HAVE). I need to do it in a table with the results in pairs (if i use proc corr all variables it will create a triangular matrix with repetitions - VAR1 with VAR 2 and Var 2 with var 1)
So, I want to make it with this steps:
1- A loop that gets the VAR1 name, and the VAR 2 and make a proc corr with this two variables and write it to a new table:
VAR1 (name); VAR2 (name); result
2- Next cicle: VAR1 and VAR 3, and so on:
VAR1 (name); VAR2 (name); result
VAR1 (name); VAR3 (name); result
.
.
.
VAR1 (name); VARN (name); result, where N=count variables of the dataset (HAVE)
3- When this is complete, it should go to the next VAR and write:
VAR1 (name); VAR2 (name); result
VAR1 (name); VAR3 (name); result
...
VAR1 (name); VARN (name); result
+
VAR2 (name); VAR3 (name); result
VAR2 (name); VAR4 (name); result
...
VAR2 (name); VARN (name); result.
This means, that there is a loop that takes a VAR i and a second loop witch begins in VAR i+1 (because i don´t need to repeat VAR2 with VAR 1 again)
Resuming,
1-get the variables names with a loop
2-get variables names inside previous loop
3-execute proc corr
4-Write the results (VAR names and r value) in a separate table.
The proc corr, should be something like this:
proc corr data=HAVE; var VAR1 VAR2; run;
THANK YOU IN ADVANCE FOR YOUR PRECIOUS THINKING TIME!
Krauss
Why would you want to run PROC CORR multiple times when you can run it once and get all of the correlations at once?
You can just reformat the output if you want. Here is a method using PROC TRANSPOSE.
proc corr data=sashelp.class noprint outp=step1 ;
var _numeric_;
run;
proc print data=step1;
run;
proc transpose data=step1 name=var2
out=want(keep=var2 _name_ col1
rename=(col1=corr _name_=var1)
where=(var1 < var2)
)
;
where _type_='CORR';
by _name_ notsorted ;
var _numeric_ ;
run;
proc print data=want;
run;
Results:
The SAS System Obs _TYPE_ _NAME_ Age Height Weight 1 MEAN 13.3158 62.3368 100.026 2 STD 1.4927 5.1271 22.774 3 N 19.0000 19.0000 19.000 4 CORR Age 1.0000 0.8114 0.741 5 CORR Height 0.8114 1.0000 0.878 6 CORR Weight 0.7409 0.8778 1.000 The SAS System Obs var1 var2 corr 1 Age Height 0.81143 2 Age Weight 0.74089 3 Height Weight 0.87779
If I am understanding you properly, you can work with the correlation matrix, and just loop through it to get the results you are asking for. There is no need to include PROC CORR in the loop; in fact including PROC CORR in the loop would be extremely inefficient.
proc corr nosimple data=sashelp.cars outp=corrs noprint;
var _numeric_;
run;
data _null_;
length string $ 128;
set corrs(where=(_type_='CORR'));
array x _numeric_;
do i=1 to dim(x);
string = cat("Correlation Between ",trim(_name_)," and ",vname(x(i)),'= ',x(i));
put string;
end;
run;
If you don't want the correlations on the diagonal, don't write them out. If you only want the lower or upper part of the correlation matrix, you can eliminate the parts you don't want. Naturally, if you don't want results in this format, its easy to produce any other format you'd like, such as a SAS data set with each correlation in its own row.
Here's an approach using a data _null_ step:
data _null_;
set sashelp.class(obs=1) indsname=dsname;
array num_vars(3) Age Height Weight;
do i=2 to dim(num_vars);
call execute(cat(
'proc corr data=',dsname,' out=','output_',vname(num_vars(1)),'_',vname(num_vars(i)),'; ',
'var ',vname(num_vars(1)),' ',vname(num_vars(i)),';',
'run;'
)
);
end;
run;
-unison
So, you basically want to execute a PROC CORR for all the variables (or a well defined subset of variables) in a table?
The easiest thing is probably to use SQL to generate the code, e.g.:
proc sql noprint;
select catx(' ','PROC CORR data=SASHELP.CLASS outs=',catx('_','corr',a.name,b.name),'noprint;var',a.name,b.name,';run;') into :doit separated by ';'
from dictionary.columns a join dictionary.columns b on
a.libname=b.libname and a.memname=b.memname and a.name<b.name
where a.libname='SASHELP' and a.memname='CLASS' and a.type='num' and b.type='num';
quit;
&doit
Why would you want to run PROC CORR multiple times when you can run it once and get all of the correlations at once?
You can just reformat the output if you want. Here is a method using PROC TRANSPOSE.
proc corr data=sashelp.class noprint outp=step1 ;
var _numeric_;
run;
proc print data=step1;
run;
proc transpose data=step1 name=var2
out=want(keep=var2 _name_ col1
rename=(col1=corr _name_=var1)
where=(var1 < var2)
)
;
where _type_='CORR';
by _name_ notsorted ;
var _numeric_ ;
run;
proc print data=want;
run;
Results:
The SAS System Obs _TYPE_ _NAME_ Age Height Weight 1 MEAN 13.3158 62.3368 100.026 2 STD 1.4927 5.1271 22.774 3 N 19.0000 19.0000 19.000 4 CORR Age 1.0000 0.8114 0.741 5 CORR Height 0.8114 1.0000 0.878 6 CORR Weight 0.7409 0.8778 1.000 The SAS System Obs var1 var2 corr 1 Age Height 0.81143 2 Age Weight 0.74089 3 Height Weight 0.87779
Thank you. I tried your code and it writes only one "string" for each line. It means that it doens't append to a different table all the correlations that were calculated (On the other hand it doensn't contains the second loop)
I apologise if there is something I should do/understand (I'm newbie)
@ unison
Thank you. The problem with your solution is: If my table has 500 features?
Thank you. It's perfect!
Thank you. It's perfect!
THANK YOU!
I humbly thank you for the answers and how quickly you gave them.
@Krauss wrote:
Thank you. I tried your code and it writes only one "string" for each line. It means that it doens't append to a different table all the correlations that were calculated (On the other hand it doensn't contains the second loop)
I apologise if there is something I should do/understand (I'm newbie)
I think it does everything you asked for. It only needs one loop, all the correlations are handled in one loop.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.