Solved: Re: Create freq crosstab matrix and for each combination calculate per...

sivakoya · Posted 02-03-2017 11:34 AM

Hi , I was working with this before and this has been answered in the post

https://communities.sas.com/t5/SAS-Procedures/Create-frequency-count-crosstab-matrix-and-for-each-co...

I have a following dataset

ID   attribute1   attribute2   attribute3     BAD (Y-> yes)
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N

I want to create frequency count matrix, for each combination of attribute tripped (atrribute = 1 means tripped) like following:

                         attribute1   attribute2    attribute3
attribute1                  _                3                  3
attribute2                 3                 _                 1
attribute3                 3                 1                  _

It represents counts of attributes tripped together.

Once i have the counts I need to calcualte %BAD in each combination like below.

                      attribute1     attribute2     attribute3
attribute1             _               12.50%         25%
attribute2        12.50%             _                   0%
attribute3            25%              0%                _

it represents out of 3 ID's for the combination atribute1 and atribute2 tripped, one ID went BAD -> so 12.5% BAD

And out of 3 ID's for the combination attribute1 and attribute1 tripped, two went BAD -> so 25% BAD

And the 1 ID for the combination attribute2 and attribute3 tripped, didnt went BAD -> so 0% BAD

I need to expand this analysis t0 3 variable combinations to get more meaningful results.

My dataset look like:

ID	grade	type	score	attribute1	attribute2	attribute3	attribute4	attribute5	attribute6	BAD (Y-yes/N- No)
1	A	1	100-110	1	0	0	1	0	0	Y
2	A	1	120-130	1	1	0	1	1	0	N
3	B	2	70-80	1	0	1	1	0	1	Y
4	C	3	30-40	0	1	1	0	1	1	N
5	A	3	40-50	1	1	0	1	1	0	Y
6	B	4	80-90	1	1	0	1	1	0	N
7	C	2	40-50	1	0	1	1	0	1	Y
8	C	4	30-40	1	0	1	1	0	1	N

Im trying to get to this analysis: I am showing values only for grade A as example:

				frequency Attributes/Attribute combinations
			Attribute1	Attribute2	Attribute4	Attr1&Attr4	Attr1&Attr2	Attr1&Attr2&Attr4
BAD	grade	score
yes	A	100-110	1		1	1
		110-120
		120-130
	B	90-100
		80-90
		70-80
	C	40-50
		30-40

No	A	100-110	2	1	2	2	1	1
		110-120
		120-130
	B	90-100
		80-90
		70-80
	C	40-50
		30-40

I was looking at http://www.datavis.ca/papers/sugi/mdarray/mdarray.pdf on mutildimensional arrays and mosaic displays. Wasn't quite sure how to get to such kind of analysis with this type of data. Any suggestions?

Thanks

AhmedAl_Attar · Posted 02-04-2017 10:33 PM

Try this

Run the TypeFormat macro declared in http://www2.sas.com/proceedings/sugi29/045-29.pdf, and then run this code

proc sort data=have out=srtd_have;
   by id grade type score_range bad;
run;

%TypeFormat(formatname=testtyp,var=a1 a2 a3 a4 a5);
proc summary data=WORK.srtd_have;
   by id grade type score_range bad;
   class a1-a5;
   ways 1 2 3 4 5;
   format _type_ testtyp.;
   output out=work.sum;
run;

Once the work.sum table is created, then you can slice&dice it as you want.

Hope this helps,

Ahmed

View solution in original post

rogerjdeangelis · Posted 02-03-2017 01:33 PM

SAS: Intersection counts for all combinations of column logicals (0/1)

Just the first part.

I recoded the 0s to missing so that proc corr would cout the non-missing intersecitons

proc summary may  work and be much faster?

HAVE
====

Up to 40 obs WORK.HAVE total obs=8

Obs    A1   A2   A3

 1      1    .    .
 2      1    1    .
 3      1    .    1
 4      .    1    1
 5      1    1    .
 6      1    1    .
 7      1    .    1
 8      1    .    1


WANT
===

 Up to 40 obs from mat total obs=3

 Obs    VARIABLE  NA1   NA2   NA3

  1        A1       7     3     3
  2        A2       3     4     1 -->  A2=1 and A3=1 se below
  3        A3       3     1     4

DETAILS
                    A2   A3

                     .    .
                     1    .
                     .    1
                     1    1    Only this one
                     1    .
                     1    .
                     .    1
                     .    1

WORKING CODE
============

     proc corr data=wrk.have ;
     var _numeric_;
     with _numeric_;


FULL SOLUTION
=============


options validvarname=upcase;
data have;
input a1 a2 a3;
cards4;
 1 . .
 1 1 .
 1 . 1
 . 1 1
 1 1 .
 1 1 .
 1 . 1
 1 . 1
;;;;
run;quit;


%utl_submit_wps64('
libname wrk "%sysfunc(pathname(work))";
ods output pearsoncorr=wrk.want(keep=variable n:) ;
proc corr data=wrk.have ;
var _numeric_;
with _numeric_;
run;quit;
');

proc print data=want width=min;
run;quit;

Obs    VARIABLE    NA1    NA2    NA3

 1        A1        7      3      3
 2        A2        3      4      1
 3        A3        3      1      4

sivakoya · Posted 02-03-2017 01:51 PM

@rogerjdeangelis thanks for the solution for variable intersections. I also wanted to at the correlation at the combination of these variables.

ex:

two combinations:

A1, A2

A1, A3

A1, A4

A1, A5

A2, A3

A2, A4

A2, A5

A3, A4

A3, A5

A4, A5

Three occuring together:

A1, A2, A3

A1, A2, A4

A1, A2, A5

A1, A3, A4

A1, A3, A5,

A1, A4, A5

Four occuring together:

A1, A2, A3, A4

A1, A2, A4, A5

A1, A3, A4, A5

All five :

A1, A2, A3, A4, A5

sivakoya · Posted 02-03-2017 02:05 PM

@rogerjdeangelis

So if we look at it like a report I need to build something like this. Counts for various combination of these occurances.

A1, A2

A1, A3

A1, A4

A1, A5

A2, A3

A2, A4

A2, A5

A3, A4

A3, A5

A4, A5

A1, A2, A3

A1, A2, A4

A1, A2, A5

A1, A3, A4

A1, A3, A5,

A1, A4, A5

A1, A2, A3, A4

A1, A2, A4, A5

A1, A3, A4, A5

A1, A2, A3, A4, A5

BAD

grade

score

yes

A

100-110

110-120

120-130

B

90-100

80-90

70-80

C

40-50

30-40

No

A

100-110

110-120

120-130

B

90-100

80-90

70-80

C

40-50

30-40

Thanks

rogerjdeangelis · Posted 02-03-2017 01:42 PM

SAS: Intersection counts for all combinations of column logicals (0/1)

Just the first part.

I recoded the 0s to missing so that proc corr would cout the non-missing intersecitons

proc summary may  work and be much faster?

HAVE
====

Up to 40 obs WORK.HAVE total obs=8

Obs    A1   A2   A3

 1      1    .    .
 2      1    1    .
 3      1    .    1
 4      .    1    1
 5      1    1    .
 6      1    1    .
 7      1    .    1
 8      1    .    1


WANT
===

 Up to 40 obs from mat total obs=3

 Obs    VARIABLE  NA1   NA2   NA3

  1        A1       7     3     3
  2        A2       3     4     1 -->  A2=1 and A3=1 se below
  3        A3       3     1     4

DETAILS
                    A2   A3

                     .    .
                     1    .
                     .    1
                     1    1    Only this one
                     1    .
                     1    .
                     .    1
                     .    1

WORKING CODE
============

     proc corr data=wrk.have ;
     var _numeric_;
     with _numeric_;


FULL SOLUTION
=============


options validvarname=upcase;
data have;
input a1 a2 a3;
cards4;
 1 . .
 1 1 .
 1 . 1
 . 1 1
 1 1 .
 1 1 .
 1 . 1
 1 . 1
;;;;
run;quit;


%utl_submit_wps64('
libname wrk "%sysfunc(pathname(work))";
ods output pearsoncorr=wrk.want(keep=variable n:) ;
proc corr data=wrk.have ;
var _numeric_;
with _numeric_;
run;quit;
');

proc print data=want width=min;
run;quit;

Obs    VARIABLE    NA1    NA2    NA3

 1        A1        7      3      3
 2        A2        3      4      1
 3        A3        3      1      4

sivakoya · Posted 02-03-2017 04:41 PM

data have;

input id grade$ type score_range$ a1$ a2$ a3$ a4$ a5$ bad$;

datalines;

1 a 1 140-150 1 0 0 0 0 Y

2 a 1 130-140 1 1 0 0 0 N

3 a 2 130-140 1 1 0 1 0 N

4 a 3 140-150 1 0 0 1 1 Y

5 b 2 120-130 1 1 1 1 0 N

6 b 1 110-120 0 0 1 0 0 N

7 b 1 120-130 0 0 1 1 0 N

8 c 1 90-100 1 1 0 0 0 Y

9 a 1 130-140 1 0 0 0 0 Y

10 a 1 130-140 1 0 0 0 0 Y

;

run;

data have2;

set have;

format a $20.;

if a1=0 then a1='';

if a2=0 then a2='';

if a3=0 then a3='';

if a4=0 then a4='';

if a5=0 then a5='';

if a1=1 then a1='a1';

if a2=1 then a2='a2';

if a3=1 then a3='a3';

if a4=1 then a4='a4';

if a5=1 then a5='a5';

a = catx('*',a1,a2,a3,a4,a5);

run;

proc freq data=have2;

tables grade*type*score_range* a *bad / missing list;

run;

I am able to get till here. results of proc freq:

grade	type	score_range	a	bad	Frequency
a	1	130-140	a1	Y	2
a	1	130-140	a1*a2	N	1
a	1	140-150	a1	Y	1
a	2	130-140	a1a2a4	N	1
a	3	140-150	a1a4a5	Y	1
b	1	110-120	a3	N	1
b	1	120-130	a3*a4	N	1
b	2	120-130	a1a2a3*a4	N	1
c	1	90-100	a1*a2	Y	1

I need to lay this out in a proc report format so i can get to :

BAD	Grade	Type	scorerange	a1	a2	a3	a4	a5	a1*a2	a1*a3	a1*a4	a1*a5
Y	A	1	140-150
			130-140
		2	130-140
		3	140-150
	B	1	110-120
			120-130
		2	120-130
	C	1	90-100
N	A	1	140-150
			130-140
		2	130-140
		3	140-150
	B	1	110-120
			120-130
		2	120-130
	C	1	90-100

Thanks

AhmedAl_Attar · Posted 02-04-2017 10:33 PM

Try this

Run the TypeFormat macro declared in http://www2.sas.com/proceedings/sugi29/045-29.pdf, and then run this code

proc sort data=have out=srtd_have;
   by id grade type score_range bad;
run;

%TypeFormat(formatname=testtyp,var=a1 a2 a3 a4 a5);
proc summary data=WORK.srtd_have;
   by id grade type score_range bad;
   class a1-a5;
   ways 1 2 3 4 5;
   format _type_ testtyp.;
   output out=work.sum;
run;

Once the work.sum table is created, then you can slice&dice it as you want.

Hope this helps,

Ahmed

sivakoya · Posted 02-07-2017 04:02 PM

@AhmedAl_Attar thanks

Create freq crosstab matrix and for each combination calculate percentages of target var

Re: Create freq crosstab matrix and for each combination calculate percentages of target var

Re: Create freq crosstab matrix and for each combination calculate percentages of target var

Re: Create freq crosstab matrix and for each combination calculate percentages of target var

Re: Create freq crosstab matrix and for each combination calculate percentages of target var

Re: Create freq crosstab matrix and for each combination calculate percentages of target var

Re: Create freq crosstab matrix and for each combination calculate percentages of target var

Re: Create freq crosstab matrix and for each combination calculate percentages of target var

Re: Create freq crosstab matrix and for each combination calculate percentages of target var

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away