Hi , I was working with this before and this has been answered in the post
I have a following dataset
ID attribute1 attribute2 attribute3 BAD (Y-> yes)
1 1 0 0 Y
2 1 1 0 N
3 1 0 1 Y
4 0 1 1 N
5 1 1 0 Y
6 1 1 0 N
7 1 0 1 Y
8 1 0 1 N
I want to create frequency count matrix, for each combination of attribute tripped (atrribute = 1 means tripped) like following:
attribute1 attribute2 attribute3
attribute1 _ 3 3
attribute2 3 _ 1
attribute3 3 1 _
It represents counts of attributes tripped together.
Once i have the counts I need to calcualte %BAD in each combination like below.
attribute1 attribute2 attribute3
attribute1 _ 12.50% 25%
attribute2 12.50% _ 0%
attribute3 25% 0% _
it represents out of 3 ID's for the combination atribute1 and atribute2 tripped, one ID went BAD -> so 12.5% BAD
And out of 3 ID's for the combination attribute1 and attribute1 tripped, two went BAD -> so 25% BAD
And the 1 ID for the combination attribute2 and attribute3 tripped, didnt went BAD -> so 0% BAD
I need to expand this analysis t0 3 variable combinations to get more meaningful results.
My dataset look like:
ID | grade | type | score | attribute1 | attribute2 | attribute3 | attribute4 | attribute5 | attribute6 | BAD (Y-yes/N- No) |
1 | A | 1 | 100-110 | 1 | 0 | 0 | 1 | 0 | 0 | Y |
2 | A | 1 | 120-130 | 1 | 1 | 0 | 1 | 1 | 0 | N |
3 | B | 2 | 70-80 | 1 | 0 | 1 | 1 | 0 | 1 | Y |
4 | C | 3 | 30-40 | 0 | 1 | 1 | 0 | 1 | 1 | N |
5 | A | 3 | 40-50 | 1 | 1 | 0 | 1 | 1 | 0 | Y |
6 | B | 4 | 80-90 | 1 | 1 | 0 | 1 | 1 | 0 | N |
7 | C | 2 | 40-50 | 1 | 0 | 1 | 1 | 0 | 1 | Y |
8 | C | 4 | 30-40 | 1 | 0 | 1 | 1 | 0 | 1 | N |
Im trying to get to this analysis: I am showing values only for grade A as example:
frequency Attributes/Attribute combinations | ||||||||
Attribute1 | Attribute2 | Attribute4 | Attr1&Attr4 | Attr1&Attr2 | Attr1&Attr2&Attr4 | |||
BAD | grade | score | ||||||
yes | A | 100-110 | 1 | 1 | 1 | |||
110-120 | ||||||||
120-130 | ||||||||
B | 90-100 | |||||||
80-90 | ||||||||
70-80 | ||||||||
C | 40-50 | |||||||
30-40 | ||||||||
No | A | 100-110 | 2 | 1 | 2 | 2 | 1 | 1 |
110-120 | ||||||||
120-130 | ||||||||
B | 90-100 | |||||||
80-90 | ||||||||
70-80 | ||||||||
C | 40-50 | |||||||
30-40 |
I was looking at http://www.datavis.ca/papers/sugi/mdarray/mdarray.pdf on mutildimensional arrays and mosaic displays. Wasn't quite sure how to get to such kind of analysis with this type of data. Any suggestions?
Thanks
Try this
Run the TypeFormat macro declared in http://www2.sas.com/proceedings/sugi29/045-29.pdf, and then run this code
proc sort data=have out=srtd_have;
by id grade type score_range bad;
run;
%TypeFormat(formatname=testtyp,var=a1 a2 a3 a4 a5);
proc summary data=WORK.srtd_have;
by id grade type score_range bad;
class a1-a5;
ways 1 2 3 4 5;
format _type_ testtyp.;
output out=work.sum;
run;
Once the work.sum table is created, then you can slice&dice it as you want.
Hope this helps,
Ahmed
SAS: Intersection counts for all combinations of column logicals (0/1)
Just the first part.
I recoded the 0s to missing so that proc corr would cout the non-missing intersecitons
proc summary may work and be much faster?
HAVE
====
Up to 40 obs WORK.HAVE total obs=8
Obs A1 A2 A3
1 1 . .
2 1 1 .
3 1 . 1
4 . 1 1
5 1 1 .
6 1 1 .
7 1 . 1
8 1 . 1
WANT
===
Up to 40 obs from mat total obs=3
Obs VARIABLE NA1 NA2 NA3
1 A1 7 3 3
2 A2 3 4 1 --> A2=1 and A3=1 se below
3 A3 3 1 4
DETAILS
A2 A3
. .
1 .
. 1
1 1 Only this one
1 .
1 .
. 1
. 1
WORKING CODE
============
proc corr data=wrk.have ;
var _numeric_;
with _numeric_;
FULL SOLUTION
=============
options validvarname=upcase;
data have;
input a1 a2 a3;
cards4;
1 . .
1 1 .
1 . 1
. 1 1
1 1 .
1 1 .
1 . 1
1 . 1
;;;;
run;quit;
%utl_submit_wps64('
libname wrk "%sysfunc(pathname(work))";
ods output pearsoncorr=wrk.want(keep=variable n:) ;
proc corr data=wrk.have ;
var _numeric_;
with _numeric_;
run;quit;
');
proc print data=want width=min;
run;quit;
Obs VARIABLE NA1 NA2 NA3
1 A1 7 3 3
2 A2 3 4 1
3 A3 3 1 4
@rogerjdeangelis thanks for the solution for variable intersections. I also wanted to at the correlation at the combination of these variables.
ex:
two combinations:
A1, A2
A1, A3
A1, A4
A1, A5
A2, A3
A2, A4
A2, A5
A3, A4
A3, A5
A4, A5
Three occuring together:
A1, A2, A3
A1, A2, A4
A1, A2, A5
A1, A3, A4
A1, A3, A5,
A1, A4, A5
Four occuring together:
A1, A2, A3, A4
A1, A2, A4, A5
A1, A3, A4, A5
All five :
A1, A2, A3, A4, A5
So if we look at it like a report I need to build something like this. Counts for various combination of these occurances.
A1, A2 | A1, A3 | A1, A4 | A1, A5 | A2, A3 | A2, A4 | A2, A5 | A3, A4 | A3, A5 | A4, A5 | A1, A2, A3 | A1, A2, A4 | A1, A2, A5 | A1, A3, A4 | A1, A3, A5, | A1, A4, A5 | A1, A2, A3, A4 | A1, A2, A4, A5 | A1, A3, A4, A5 | A1, A2, A3, A4, A5 | |||
BAD | grade | score | ||||||||||||||||||||
yes | A | 100-110 | ||||||||||||||||||||
110-120 | ||||||||||||||||||||||
120-130 | ||||||||||||||||||||||
B | 90-100 | |||||||||||||||||||||
80-90 | ||||||||||||||||||||||
70-80 | ||||||||||||||||||||||
C | 40-50 | |||||||||||||||||||||
30-40 | ||||||||||||||||||||||
No | A | 100-110 | ||||||||||||||||||||
110-120 | ||||||||||||||||||||||
120-130 | ||||||||||||||||||||||
B | 90-100 | |||||||||||||||||||||
80-90 | ||||||||||||||||||||||
70-80 | ||||||||||||||||||||||
C | 40-50 | |||||||||||||||||||||
30-40 |
Thanks
SAS: Intersection counts for all combinations of column logicals (0/1)
Just the first part.
I recoded the 0s to missing so that proc corr would cout the non-missing intersecitons
proc summary may work and be much faster?
HAVE
====
Up to 40 obs WORK.HAVE total obs=8
Obs A1 A2 A3
1 1 . .
2 1 1 .
3 1 . 1
4 . 1 1
5 1 1 .
6 1 1 .
7 1 . 1
8 1 . 1
WANT
===
Up to 40 obs from mat total obs=3
Obs VARIABLE NA1 NA2 NA3
1 A1 7 3 3
2 A2 3 4 1 --> A2=1 and A3=1 se below
3 A3 3 1 4
DETAILS
A2 A3
. .
1 .
. 1
1 1 Only this one
1 .
1 .
. 1
. 1
WORKING CODE
============
proc corr data=wrk.have ;
var _numeric_;
with _numeric_;
FULL SOLUTION
=============
options validvarname=upcase;
data have;
input a1 a2 a3;
cards4;
1 . .
1 1 .
1 . 1
. 1 1
1 1 .
1 1 .
1 . 1
1 . 1
;;;;
run;quit;
%utl_submit_wps64('
libname wrk "%sysfunc(pathname(work))";
ods output pearsoncorr=wrk.want(keep=variable n:) ;
proc corr data=wrk.have ;
var _numeric_;
with _numeric_;
run;quit;
');
proc print data=want width=min;
run;quit;
Obs VARIABLE NA1 NA2 NA3
1 A1 7 3 3
2 A2 3 4 1
3 A3 3 1 4
data have;
input id grade$ type score_range$ a1$ a2$ a3$ a4$ a5$ bad$;
datalines;
1 a 1 140-150 1 0 0 0 0 Y
2 a 1 130-140 1 1 0 0 0 N
3 a 2 130-140 1 1 0 1 0 N
4 a 3 140-150 1 0 0 1 1 Y
5 b 2 120-130 1 1 1 1 0 N
6 b 1 110-120 0 0 1 0 0 N
7 b 1 120-130 0 0 1 1 0 N
8 c 1 90-100 1 1 0 0 0 Y
9 a 1 130-140 1 0 0 0 0 Y
10 a 1 130-140 1 0 0 0 0 Y
;
run;
data have2;
set have;
format a $20.;
if a1=0 then a1='';
if a2=0 then a2='';
if a3=0 then a3='';
if a4=0 then a4='';
if a5=0 then a5='';
if a1=1 then a1='a1';
if a2=1 then a2='a2';
if a3=1 then a3='a3';
if a4=1 then a4='a4';
if a5=1 then a5='a5';
a = catx('*',a1,a2,a3,a4,a5);
run;
proc freq data=have2;
tables grade*type*score_range* a *bad / missing list;
run;
I am able to get till here. results of proc freq:
grade | type | score_range | a | bad | Frequency |
a | 1 | 130-140 | a1 | Y | 2 |
a | 1 | 130-140 | a1*a2 | N | 1 |
a | 1 | 140-150 | a1 | Y | 1 |
a | 2 | 130-140 | a1*a2*a4 | N | 1 |
a | 3 | 140-150 | a1*a4*a5 | Y | 1 |
b | 1 | 110-120 | a3 | N | 1 |
b | 1 | 120-130 | a3*a4 | N | 1 |
b | 2 | 120-130 | a1*a2*a3*a4 | N | 1 |
c | 1 | 90-100 | a1*a2 | Y | 1 |
I need to lay this out in a proc report format so i can get to :
BAD | Grade | Type | scorerange | a1 | a2 | a3 | a4 | a5 | a1*a2 | a1*a3 | a1*a4 | a1*a5 |
Y | A | 1 | 140-150 | |||||||||
130-140 | ||||||||||||
2 | 130-140 | |||||||||||
3 | 140-150 | |||||||||||
B | 1 | 110-120 | ||||||||||
120-130 | ||||||||||||
2 | 120-130 | |||||||||||
C | 1 | 90-100 | ||||||||||
N | A | 1 | 140-150 | |||||||||
130-140 | ||||||||||||
2 | 130-140 | |||||||||||
3 | 140-150 | |||||||||||
B | 1 | 110-120 | ||||||||||
120-130 | ||||||||||||
2 | 120-130 | |||||||||||
C | 1 | 90-100 |
Thanks
Try this
Run the TypeFormat macro declared in http://www2.sas.com/proceedings/sugi29/045-29.pdf, and then run this code
proc sort data=have out=srtd_have;
by id grade type score_range bad;
run;
%TypeFormat(formatname=testtyp,var=a1 a2 a3 a4 a5);
proc summary data=WORK.srtd_have;
by id grade type score_range bad;
class a1-a5;
ways 1 2 3 4 5;
format _type_ testtyp.;
output out=work.sum;
run;
Once the work.sum table is created, then you can slice&dice it as you want.
Hope this helps,
Ahmed
@AhmedAl_Attar thanks
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.