Hello,
I would like to ask please a question.
In my source data set there are 10 binary variables (each varaible get value 1 or 0 ).
Each variable has a meaning of another overide businss rule.
I want to create a matrix report(10x10) with information of the correlation (assosiation) between the binary variables.
The target is to see which variables (Overide rules) are working together and which are not working together.
May anyone send an example how to do it .
Let's say that the user define the varaibles list in a macro varaiable.
%let List= Z1+Z2+Z3+Z4+Z5+Z6+Z7+Z8+Z9+Z10;
See here whether this is what you want: https://data-flair.training/blogs/sas-correlation-analysis/
Check COSINE correlation.
@Rick_SAS wrote blog about it .
https://blogs.sas.com/content/iml/2019/09/05/cosine-similarity-recommendations.html
https://blogs.sas.com/content/iml/2019/09/03/cosine-similarity.html
You probably want the Phi coefficient, which is the equivalent of the Pearson correlation for binary variables. This can be computed in PROC FREQ for any pair of variables, and so if you really want a 10x10 table, you'd have to write a macro and loop over all possible pairs of variables.
In the case of two binary variables, I am not aware of any difference between "association" and "correlation".
Show some example data.
Or we could use the example data in Rick Wiklen's blog. https://blogs.sas.com/content/iml/2019/09/05/cosine-similarity-recommendations.html
data recipes;
input Recipe $ 1-20
(Tomato Garlic Salt Onion TomatoPaste OliveOil Celery Broth
GreenPepper Cumin Flour BrownSugar BayLeaf GroundBeef
BlackPepper ChiliPowder Cilantro Carrot CayennePepper Oregano
Oil Parsley PorkSausage RedPepper Paprika Thyme Tomatillo
JalapenoPepper WorcestershireSauce Lime
Eggplant GreenOlives Capers Sugar) (1.);
datalines;
Spag Sauce 1111110000000000000101000000000000
Spag Meat Sauce 1111111010001100010000110000000000
Eggplant Relish 0111110000000000000000000000001111
Creole Sauce 1011111110000010000000001100100000
Salsa 1111000000000000100000000011010000
Enchilada Sauce 1000000101110001001010000000000000
;
To compare the COLUMNS instead of the ROWS just transpose the data first.
proc transpose data=recipes out=ingrediants name=ingrediant ;
id recipe ;
var _numeric_;
run;
proc distance data=ingrediants out=cos_ingrediants method=COSINE shape=square;
var ratio(_NUMERIC_);
id ingrediant;
run;
So in your case if you make your macro variable contain a space delimited list of variable names (instead of + delimited). You might even be able to use a variable list syntax.
Then the code is something like:
%let List= Z1-Z10;
proc transpose data=HAVE out=columns name=COLUMN;
var &list;
run;
proc distance data=columns out=cos method=COSINE shape=square;
var ratio(_NUMERIC_);
id COLUMN;
run;
proc print data=cos;
run;
So if I use RECIPES as the HAVE and set LIST= Tomato Garlic Salt Onion then the result is:
Obs COLUMN Tomato Garlic Salt Onion 1 Tomato 1.00000 0.67082 0.80000 0.80000 2 Garlic 0.67082 1.00000 0.89443 0.89443 3 Salt 0.80000 0.89443 1.00000 1.00000 4 Onion 0.80000 0.89443 1.00000 1.00000
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.