- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I would like to ask please a question.
In my source data set there are 10 binary variables (each varaible get value 1 or 0 ).
Each variable has a meaning of another overide businss rule.
I want to create a matrix report(10x10) with information of the correlation (assosiation) between the binary variables.
The target is to see which variables (Overide rules) are working together and which are not working together.
May anyone send an example how to do it .
Let's say that the user define the varaibles list in a macro varaiable.
%let List= Z1+Z2+Z3+Z4+Z5+Z6+Z7+Z8+Z9+Z10;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
See here whether this is what you want: https://data-flair.training/blogs/sas-correlation-analysis/
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Check COSINE correlation.
@Rick_SAS wrote blog about it .
https://blogs.sas.com/content/iml/2019/09/05/cosine-similarity-recommendations.html
https://blogs.sas.com/content/iml/2019/09/03/cosine-similarity.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You probably want the Phi coefficient, which is the equivalent of the Pearson correlation for binary variables. This can be computed in PROC FREQ for any pair of variables, and so if you really want a 10x10 table, you'd have to write a macro and loop over all possible pairs of variables.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That is chisquare test .
And Phi Coefficient / Contingency Coefficient measure the association between two variables NOT correlation .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In the case of two binary variables, I am not aware of any difference between "association" and "correlation".
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Show some example data.
Or we could use the example data in Rick Wiklen's blog. https://blogs.sas.com/content/iml/2019/09/05/cosine-similarity-recommendations.html
data recipes;
input Recipe $ 1-20
(Tomato Garlic Salt Onion TomatoPaste OliveOil Celery Broth
GreenPepper Cumin Flour BrownSugar BayLeaf GroundBeef
BlackPepper ChiliPowder Cilantro Carrot CayennePepper Oregano
Oil Parsley PorkSausage RedPepper Paprika Thyme Tomatillo
JalapenoPepper WorcestershireSauce Lime
Eggplant GreenOlives Capers Sugar) (1.);
datalines;
Spag Sauce 1111110000000000000101000000000000
Spag Meat Sauce 1111111010001100010000110000000000
Eggplant Relish 0111110000000000000000000000001111
Creole Sauce 1011111110000010000000001100100000
Salsa 1111000000000000100000000011010000
Enchilada Sauce 1000000101110001001010000000000000
;
To compare the COLUMNS instead of the ROWS just transpose the data first.
proc transpose data=recipes out=ingrediants name=ingrediant ;
id recipe ;
var _numeric_;
run;
proc distance data=ingrediants out=cos_ingrediants method=COSINE shape=square;
var ratio(_NUMERIC_);
id ingrediant;
run;
So in your case if you make your macro variable contain a space delimited list of variable names (instead of + delimited). You might even be able to use a variable list syntax.
Then the code is something like:
%let List= Z1-Z10;
proc transpose data=HAVE out=columns name=COLUMN;
var &list;
run;
proc distance data=columns out=cos method=COSINE shape=square;
var ratio(_NUMERIC_);
id COLUMN;
run;
proc print data=cos;
run;
So if I use RECIPES as the HAVE and set LIST= Tomato Garlic Salt Onion then the result is:
Obs COLUMN Tomato Garlic Salt Onion 1 Tomato 1.00000 0.67082 0.80000 0.80000 2 Garlic 0.67082 1.00000 0.89443 0.89443 3 Salt 0.80000 0.89443 1.00000 1.00000 4 Onion 0.80000 0.89443 1.00000 1.00000