BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Krauss
Fluorite | Level 6

Hi all,

 

I admit that this question could be already answered before. At least, in separate parts, but.....

I'm a newbie 🙂 and can't join it together.

 

I need to do a proc corr through all variables from a data set (HAVE). I need to do it in a table with the results in pairs (if i use proc corr all variables it will create a triangular matrix  with repetitions - VAR1 with VAR 2 and Var 2 with var 1)

 

So, I want to make it with this steps:

 

1- A loop that gets the VAR1 name, and the VAR 2 and make a proc corr with this two variables and write it to a new table:

 

VAR1 (name); VAR2 (name); result

 

2- Next cicle: VAR1 and VAR 3, and so on:

 

VAR1 (name); VAR2 (name); result

VAR1 (name); VAR3 (name); result

.

.

.

VAR1 (name); VARN (name); result,                where N=count variables of the dataset (HAVE)

 

3- When this is complete, it should go to the next VAR and write:

VAR1 (name); VAR2 (name); result

VAR1 (name); VAR3 (name); result

...

VAR1 (name); VARN (name); result

+

VAR2 (name); VAR3 (name); result

VAR2 (name); VAR4 (name); result

...

VAR2 (name); VARN (name); result.

 

 

This means, that there is a loop that takes a VAR i  and a second loop witch begins in VAR i+1 (because i don´t need to repeat VAR2 with VAR 1 again)

 

Resuming,

1-get the variables names with a loop

2-get variables names inside previous loop

3-execute proc corr

4-Write the results (VAR names and r value) in a separate table.

 

The proc corr, should be something like this:

proc corr data=HAVE;
   var VAR1 VAR2;
 run;

 

THANK YOU IN ADVANCE FOR YOUR PRECIOUS THINKING TIME!

 

Krauss 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Why would you want to run PROC CORR multiple times when you can run it once and get all of the correlations at once?

You can just reformat the output if you want.  Here is a method using PROC TRANSPOSE.

proc corr data=sashelp.class noprint outp=step1 ;
  var _numeric_;
run;

proc print data=step1;
run;

proc transpose data=step1 name=var2 
  out=want(keep=var2 _name_ col1 
            rename=(col1=corr _name_=var1)
            where=(var1 < var2)
           ) 
;
  where _type_='CORR';
  by _name_ notsorted ;
  var _numeric_ ;
run;

proc print data=want;
run;

Results:

The SAS System

Obs    _TYPE_    _NAME_      Age       Height     Weight

 1      MEAN               13.3158    62.3368    100.026
 2      STD                 1.4927     5.1271     22.774
 3      N                  19.0000    19.0000     19.000
 4      CORR     Age        1.0000     0.8114      0.741
 5      CORR     Height     0.8114     1.0000      0.878
 6      CORR     Weight     0.7409     0.8778      1.000

The SAS System

Obs    var1       var2       corr

 1     Age       Height    0.81143
 2     Age       Weight    0.74089
 3     Height    Weight    0.87779

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

If I am understanding you properly, you can work with the correlation matrix, and just loop through it to get the results you are asking for. There is no need to include PROC CORR in the loop; in fact including PROC CORR in the loop would be extremely inefficient.

 

proc corr nosimple data=sashelp.cars outp=corrs noprint;
    var _numeric_;
run;

data _null_;
    length string $ 128;
    set corrs(where=(_type_='CORR'));
	array x _numeric_;
	do i=1 to dim(x);
	    string = cat("Correlation Between ",trim(_name_)," and ",vname(x(i)),'= ',x(i));
		put string;
	end;
run;

If you don't want the correlations on the diagonal, don't write them out. If you only want the lower or upper part of the correlation matrix, you can eliminate the parts you don't want. Naturally, if you don't want results in this format, its easy to produce any other format you'd like, such as a SAS data set with each correlation in its own row.

--
Paige Miller
unison
Lapis Lazuli | Level 10

Here's an approach using a data _null_ step:

 

data _null_;
set sashelp.class(obs=1) indsname=dsname;
array num_vars(3) Age Height Weight;
do i=2 to dim(num_vars);
	call execute(cat(
		'proc corr data=',dsname,' out=','output_',vname(num_vars(1)),'_',vname(num_vars(i)),'; ',
			'var ',vname(num_vars(1)),' ',vname(num_vars(i)),';',
		'run;'
		)
	);
end;
run;

 

-unison

-unison
s_lassen
Meteorite | Level 14

So, you basically want to execute a PROC CORR for all the variables (or a well defined subset of variables) in a table?

 

The easiest thing is probably to use SQL to generate the code, e.g.:

proc sql noprint;
  select catx(' ','PROC CORR data=SASHELP.CLASS outs=',catx('_','corr',a.name,b.name),'noprint;var',a.name,b.name,';run;') into :doit separated by ';' 
  from dictionary.columns a join dictionary.columns b on
  a.libname=b.libname and a.memname=b.memname and a.name<b.name
  where a.libname='SASHELP' and a.memname='CLASS' and a.type='num' and b.type='num';
quit;

&doit
Tom
Super User Tom
Super User

Why would you want to run PROC CORR multiple times when you can run it once and get all of the correlations at once?

You can just reformat the output if you want.  Here is a method using PROC TRANSPOSE.

proc corr data=sashelp.class noprint outp=step1 ;
  var _numeric_;
run;

proc print data=step1;
run;

proc transpose data=step1 name=var2 
  out=want(keep=var2 _name_ col1 
            rename=(col1=corr _name_=var1)
            where=(var1 < var2)
           ) 
;
  where _type_='CORR';
  by _name_ notsorted ;
  var _numeric_ ;
run;

proc print data=want;
run;

Results:

The SAS System

Obs    _TYPE_    _NAME_      Age       Height     Weight

 1      MEAN               13.3158    62.3368    100.026
 2      STD                 1.4927     5.1271     22.774
 3      N                  19.0000    19.0000     19.000
 4      CORR     Age        1.0000     0.8114      0.741
 5      CORR     Height     0.8114     1.0000      0.878
 6      CORR     Weight     0.7409     0.8778      1.000

The SAS System

Obs    var1       var2       corr

 1     Age       Height    0.81143
 2     Age       Weight    0.74089
 3     Height    Weight    0.87779

Krauss
Fluorite | Level 6


@PaigeMiller

Thank you. I tried your code and it writes only one "string" for each line. It means that it doens't append to a different table all the correlations that were calculated (On the other hand it doensn't contains the second loop)
I apologise if there is something I should do/understand (I'm newbie)

@ unison

Thank you. The problem with your solution is: If my table has 500 features?

@s_lassen

Thank you. It's perfect!

@Tom

Thank you. It's perfect!

 


THANK YOU!

I humbly thank you for the answers and how quickly you gave them.

PaigeMiller
Diamond | Level 26

@Krauss wrote:


@PaigeMiller

Thank you. I tried your code and it writes only one "string" for each line. It means that it doens't append to a different table all the correlations that were calculated (On the other hand it doensn't contains the second loop)
I apologise if there is something I should do/understand (I'm newbie)

I think it does everything you asked for. It only needs one loop, all the correlations are handled in one loop.

--
Paige Miller

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1021 views
  • 5 likes
  • 5 in conversation