BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Onyx | Level 15

Hello

I want to run a code that calculate Kramer correlation matrix between categorical variables.

I have a question please about improving the code.

My code check correlation between any pair of variables 

In my example there are 4 variables X Y Z R so it calculate correlation between:

X,X    X,Y   X,Z   X,R

Y,Y    Y,X   Y,Z    Y,R

Z,Z    Z,X   Z,Y   Z,R

R,R   R,X   R,Y   R,Z

So it calculate 16 correlations

However, correlation of X,Y  is same as correlation of Y,X (and so on)

How can we improve the code that it will calculate the correlation matrix quicker?

 

Another question-

I have a warning message -

WARNING: 31% of the cells have expected counts less than 5, for the table of X  by _X
Chi-Square may not be a valid test.

 

What is the way to solve it?

 

 

 
/****categorical Variables list****/
%let list1var=X Y Z R;
%let list2var=&list1var.;

/**Number of varaibles***/
%let n =%sysfunc(countw(&list1var));
%put n=&n; 
%macro CRAMV;
%do i = 1 %to &n.;
%let var1=%scan(&list1var.,&i.,+);
%do j = 1 %to &n.;
%let var2=%scan(&list2var.,&j.,+);
ods select none;
PROC FREQ DATA=Raw_Data_tbl;
TABLES &VAR1.* (&VAR2.)/CHISQ MISSING NOPERCENT NOCOL; 
OUTPUT OUT=ttt CRAMV ;
RUN;

DATA want; 
RETAIN VAR1 VAR2;
SET ttt;
LENGTH VAR1 VAR2 $32.;
VAR1 ="&VAR1";
VAR2 ="&VAR2";
RUN;

PROC APPEND  DATA=want BASE=Kramer_All  force;quit;
%end;
%end;
%mend;
%CRAMV;

/***Create matrix***/
title;
ods select all;
proc tabulate data=Kramer_All;
class  VAR1  VAR2;
var _CRAMV_;
table (VAR1=''),VAR2=''*(_CRAMV_=''*min=''*f=6.5 );
run;

 

 

 

 

but it also calculate correlation between:

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
ods select none;
ods output  ChiSq= ChiSq;
PROC FREQ DATA=sashelp.heart(obs=1000);
TABLES (sex status bp_status)*(sex status bp_status)/CHISQ MISSING NOPERCENT NOCOL; 
RUN;
ods select all;
data ChiSq2;
 set ChiSq(where=(Statistic="Cramer's V"));
 row=scan(table,-2,' *');
 col=scan(table,-1,' *');
run;
proc tabulate data=ChiSq2 ;
class row col;
var value;
table row='',col=''*value=''*sum=''*f=6.3;
run;

Ksharp_0-1728541934202.png

 

 

"WARNING: 31% of the cells have expected counts less than 5, for the table of X  by _X
Chi-Square may not be a valid test."

Yes. You need EXACT Chisquare Test , Like:

PROC FREQ DATA=sashelp.heart(obs=1000);
TABLES sex *status/CHISQ MISSING NOPERCENT NOCOL  ; 
exact chisq  ;
RUN;

View solution in original post

9 REPLIES 9
Ksharp
Super User
ods select none;
ods output  ChiSq= ChiSq;
PROC FREQ DATA=sashelp.heart(obs=1000);
TABLES (sex status bp_status)*(sex status bp_status)/CHISQ MISSING NOPERCENT NOCOL; 
RUN;
ods select all;
data ChiSq2;
 set ChiSq(where=(Statistic="Cramer's V"));
 row=scan(table,-2,' *');
 col=scan(table,-1,' *');
run;
proc tabulate data=ChiSq2 ;
class row col;
var value;
table row='',col=''*value=''*sum=''*f=6.3;
run;

Ksharp_0-1728541934202.png

 

 

"WARNING: 31% of the cells have expected counts less than 5, for the table of X  by _X
Chi-Square may not be a valid test."

Yes. You need EXACT Chisquare Test , Like:

PROC FREQ DATA=sashelp.heart(obs=1000);
TABLES sex *status/CHISQ MISSING NOPERCENT NOCOL  ; 
exact chisq  ;
RUN;
Ronein
Onyx | Level 15

Thank you so much.
So as I understand only one change is needed to change  chisq to exact chisq.

(What is the difference in the statistical names? I want to ready the theory)

 

About my question how to improve running time of kramer matrix. 

What is the way?

Ksharp
Super User
Yes. But that would cost you lots of time if you have many obs.
Cramer's V statistic should be the same .but ChiSquare Test would be replace by Exact Fisher Test.
If you want its theory/algorithm ,check PROC FREQ 's documentation.

"how to improve running time of kramer matrix. "
What do you mean by that ? PROC FREQ calculated Cramer's V statistic should have CONSTANT /SAME time interval once the obs number is specified .
Mine would be faster than yours due to not use macro statement.
If you really want to get faster ,then you need write your own code(or IML code) to calcuate Cramer's V statistic. and @Rick_SAS might have an interesting to do that.
Ronein
Onyx | Level 15

Thanks

The question is by using my code ( user define the variables via macro variable) how to cause the code to calculate only one time the correlation between 2 variables 

(In my code for example calculated correlation between x  and y and then between y and x   but they are same...)

 

Rick_SAS
SAS Super FREQ

You seem to be asking about how to write a macro loop that iterates over (i,j) pairs where i <= j.  Look at this example. Hope it helps.

 

 

%let n = 4;
%macro TRI;
%do i = 1 %to &n.;
   %do j = &i. %to &n.;
      %put &=i, &=j;
   %end;
%end;
%mend;
%TRI;

The key is to start to J loop at &i instead of at 1.

 

Ronein
Onyx | Level 15

Thank you.

The correlation matrix is not full (not all arguments have value) when I apply this method.

I need to define that Correlation between I and J  equal to  Correlation between J and I

I used this way that is working 100%

Any idea of any other way to fill the all arguments in matrix?

 

%let list1var=sex+status+bp_status;
%let list2var=&list1var.;
%let n =%sysfunc(countw(&list1var));

%macro CRAMV;
	%do i = 1 %to &n.;
%let var1=%scan(&list1var.,&i.,+);
	%do j = &i. %to &n.;
%let var2=%scan(&list2var.,&j.,+);
ods select none;
PROC FREQ DATA=sashelp.heart;
TABLES &VAR1.* (&VAR2.)   /CHISQ MISSING NOPERCENT NOCOL; 
OUTPUT OUT=ttt CRAMV ;
RUN;

DATA ttt&i.&j.; 
RETAIN VAR1 VAR2;
SET ttt;
LENGTH VAR1 VAR2 $32.;
VAR1 ="&VAR1";
VAR2 ="&VAR2";
RUN;

DATA RRR&j.&i.(DROP=VAR1 VAR2  rename=(_VAR2_=VAR2  _VAR1_=VAR1)); 
RETAIN VAR1 VAR2;
SET ttt&i.&j.;
LENGTH VAR1 VAR2 $32.;
_VAR2_ =VAR1;
_VAR1_ =VAR2;
RUN;

Data qqq;
set ttt&i.&j. RRR&j.&i.;
Run;

PROC APPEND  DATA=qqq BASE=Kramer_All_Way2  force;
RUN;
%end;
%end;
%mend;
%CRAMV;
title;
ods select all;
proc tabulate data=Kramer_All_Way2;
class  VAR1  VAR2;
var _CRAMV_;
table (VAR1=''),VAR2='' *(_CRAMV_=''*min=''*f=6.5 );
run;

 

 

Ksharp
Super User
As Rick showed you ,
using combination ,not permutation :

%do i = 1 %to &n.;
%let var1=%scan(&list1var.,&i.,+);
%do j = %eval(&i.+1) %to &n.;
%let var2=%scan(&list2var.,&j.,+);
Ronein
Onyx | Level 15

I want to summarize the ways to create karmer matrix ,which calculate correlation between categorical variables.

 

In my original code  I calculated the correlation between each pair of variables.

In the real life with 100,000 rows and 20 variables it took long time ( 10 minutes or more) to run it.

 

Then I saw how to improve it by calculate only once the correlation between any pair ( so no need to calculate correlation between X and W and W and X. It is enought to calculate only between X and W)

 

Then  I saw that can calculate the correlation between all variables in one step  and it took 1 minute or less to calculate it. 

Thank you Mr. Ksharp!

Really big time saving!

 

/***************WAY1---Very slow running time**********************************/
/***************WAY1---Very slow running time**********************************/
/***************WAY1---Very slow running time**********************************/
proc delete data=work._all_;Run;

%let list1var=sex+status+bp_status;
%let list2var=&list1var.;
%let n =%sysfunc(countw(&list1var));

%macro CRAMV;
%do i = 1 %to &n.;
%let var1=%scan(&list1var.,&i.,+);
%do j = 1 %to &n.;
%let var2=%scan(&list2var.,&j.,+);
ods select none;
PROC FREQ DATA=sashelp.heart;
TABLES &VAR1.* (&VAR2.)   /CHISQ MISSING NOPERCENT NOCOL; 
OUTPUT OUT=ttt CRAMV ;
RUN;

DATA ttt; 
RETAIN VAR1 VAR2;
SET ttt;
LENGTH VAR1 VAR2 $32.;
VAR1 ="&VAR1";
VAR2 ="&VAR2";
RUN;

PROC APPEND  DATA=ttt BASE=Kramer_All_Way1  force;
RUN;
%end;
%end;
%mend;
%CRAMV;
title;
ods select all;
proc tabulate data=Kramer_All_Way1;
class  VAR1  VAR2;
var _CRAMV_;
table (VAR1=''),VAR2='' *(_CRAMV_=''*min=''*f=6.5 );
run;


/***************WAY2---Very slow running time**********************************/
/***************WAY2---Very slow running time**********************************/
/***************WAY2---Very slow running time**********************************/
proc delete data=work._all_;Run;

%let list1var=sex+status+bp_status;
%let list2var=&list1var.;
%let n =%sysfunc(countw(&list1var));

%macro CRAMV;
	%do i = 1 %to &n.;
%let var1=%scan(&list1var.,&i.,+);
	%do j = &i. %to &n.;
%let var2=%scan(&list2var.,&j.,+);
ods select none;
PROC FREQ DATA=sashelp.heart;
TABLES &VAR1.* (&VAR2.)   /CHISQ MISSING NOPERCENT NOCOL; 
OUTPUT OUT=ttt CRAMV ;
RUN;

DATA ttt&i.&j.; 
RETAIN VAR1 VAR2;
SET ttt;
LENGTH VAR1 VAR2 $32.;
VAR1 ="&VAR1";
VAR2 ="&VAR2";
RUN;

DATA RRR&j.&i.(DROP=VAR1 VAR2  rename=(_VAR2_=VAR2  _VAR1_=VAR1)); 
RETAIN VAR1 VAR2;
SET ttt&i.&j.;
LENGTH VAR1 VAR2 $32.;
_VAR2_ =VAR1;
_VAR1_ =VAR2;
RUN;

Data qqq;
set ttt&i.&j. RRR&j.&i.;
Run;

PROC APPEND  DATA=qqq BASE=Kramer_All_Way2  force;
RUN;
%end;
%end;
%mend;
%CRAMV;
title;
ods select all;
proc tabulate data=Kramer_All_Way2;
class  VAR1  VAR2;
var _CRAMV_;
table (VAR1=''),VAR2='' *(_CRAMV_=''*min=''*f=6.5 );
run;


/****Other way for way2 to fill the matrix with all arguments****/
/****Other way for way2 to fill the matrix with all arguments****/
/****Other way for way2 to fill the matrix with all arguments****/
proc delete data=work._all_;Run;

%let list1var=sex+status+bp_status;
%let list2var=&list1var.;
%let n =%sysfunc(countw(&list1var));
%macro CRAMV;
	%do i = 1 %to &n.;
%let var1=%scan(&list1var.,&i.,+);
	%do j = &i. %to &n.;
%let var2=%scan(&list2var.,&j.,+);
ods select none;
PROC FREQ DATA=sashelp.heart;
TABLES &VAR1.* (&VAR2.)   /CHISQ MISSING NOPERCENT NOCOL; 
OUTPUT OUT=ttt CRAMV ;
RUN;
DATA ttt; 
RETAIN VAR1 VAR2;
SET ttt;
LENGTH VAR1 VAR2 $32.;
VAR1 ="&VAR1";
VAR2 ="&VAR2";
RUN;
PROC APPEND  DATA=ttt BASE=Kramer_All_Way2b  force;
RUN;
%end;
%end;
%mend;
%CRAMV;
title;
ods select all;

 
data Kramer_All_Way2b;
set Kramer_All_Way2b;
/***Add Rows **/
output;
if var1 ne var2 then do;
temp=var1;
var1=var2;
var2=temp;
output;
end;
drop temp;
run;


proc tabulate data=Kramer_All_Way2b;
class  VAR1  VAR2;
var _CRAMV_;
table (VAR1=''),VAR2='' *(_CRAMV_=''*min=''*f=6.5 );
run;


/*****************WAY3---Much better running time!!!*******************/
/*****************WAY3---Much better running time!!!*******************/
/*****************WAY3---Much better running time!!!*******************/
%let list_var=sex status bp_status;
ods select none;
ods output  ChiSq= ChiSq;
PROC FREQ DATA=sashelp.heart;
TABLES (&list_var.)*(&list_var.)/CHISQ MISSING NOPERCENT NOCOL; 
RUN;
ods select all;
data ChiSq2;
set ChiSq(where=(Statistic="Cramer's V"));
row=scan(table,-2,' *');
col=scan(table,-1,' *');
run;
proc tabulate data=ChiSq2 ;
class row col;
var value;
table row='',col=''*value=''*sum=''*f=6.5;
run;

Ksharp
Super User

Just add one more data step to impute these missing value.

%let list1var=sex+status+bp_status;
%let list2var=&list1var.;
%let n =%sysfunc(countw(&list1var));

%macro CRAMV;
	%do i = 1 %to &n.;
%let var1=%scan(&list1var.,&i.,+);
	%do j = &i. %to &n.;
%let var2=%scan(&list2var.,&j.,+);
ods select none;
PROC FREQ DATA=sashelp.heart;
TABLES &VAR1.* (&VAR2.)   /CHISQ MISSING NOPERCENT NOCOL; 
OUTPUT OUT=ttt CRAMV ;
RUN;

DATA ttt; 
RETAIN VAR1 VAR2;
SET ttt;
LENGTH VAR1 VAR2 $32.;
VAR1 ="&VAR1";
VAR2 ="&VAR2";
RUN;

PROC APPEND  DATA=ttt BASE=Kramer_All_Way2  force;
RUN;
%end;
%end;
%mend;
%CRAMV;
title;
ods select all;


data Kramer_All_Way2;
 set Kramer_All_Way2;
 output;
 if var1 ne var2 then do;
   temp=var1;var1=var2;var2=temp;output;
 end;
 drop temp;
run;
proc tabulate data=Kramer_All_Way2;
class  VAR1  VAR2;
var _CRAMV_;
table (VAR1=''),VAR2='' *(_CRAMV_=''*min=''*f=6.5 );
run;

Ksharp_0-1728638368703.png

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2723 views
  • 4 likes
  • 3 in conversation