Solved: Create frequency count crosstab matrix and for each combination calcul...

sivakoya · Posted 09-29-2016 03:15 PM

I have a following dataset

ID   attribute1   attribute2   attribute3     BAD (Y-> yes)
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N

I want to create frequency count matrix, for each combination of attribute tripped (atrribute = 1 means tripped) like following:

                         attribute1   attribute2    attribute3
attribute1                  _                3                  3
attribute2                 3                 _                 1
attribute3                 3                 1                  _

It represents counts of attributes tripped together.

Once i have the counts I need to calcualte %BAD in each combination like below.

                      attribute1     attribute2     attribute3
attribute1             _               12.50%         25%
attribute2        12.50%             _                   0%
attribute3            25%              0%                _

it represents out of 3 ID's for the combination atribute1 and atribute2 tripped, one ID went BAD -> so 12.5% BAD

And out of 3 ID's for the combination attribute1 and attribute1 tripped, two went BAD -> so 25% BAD

And the 1 ID for the combination attribute2 and attribute3 tripped, didnt went BAD -> so 0% BAD

Thanks

Ksharp · Posted 09-30-2016 12:29 AM

It is easy for IML.


data have;
input ID   attribute1   attribute2   attribute3     BAD $;
datalines;
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N
;

proc iml;
use have nobs nobs;
read all var{attribute1 attribute2 attribute3} into x[c=vnames];
read all var{bad};
close;

xpx=x`*x;
diag=do(1,ncol(xpx)##2,ncol(xpx)+1);
xpx[diag]=.;
print xpx[r=vnames c=vnames l=''];

idx=loc(bad='Y');
temp=x[idx,];
per=(temp`*temp)/nobs;
per[diag]=.;
print per[r=vnames c=vnames l='' f=percent8.2];

quit;

View solution in original post

Reeza · Posted 09-29-2016 06:22 PM

For part 1 you can use the output from Proc Corr - though you'll have to set the diagonal to missing manually.

data have;
input ID   var1 var2 var3   BAD $;
cards;
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N
;
run;

ods output sscp=distance;
proc corr data=have sscp;
var var1-var3;
run;
proc print data=distance;
run;

https://gist.github.com/statgeek/a5184a4e1678d81e2643

sivakoya · Posted 09-29-2016 06:39 PM

Thank you for the solution. Any ideas on the second part?

Reeza · Posted 09-29-2016 06:52 PM

Nothing simple.

Perhaps proc freq?

Here's a simple example but would need modification for the two way table.

https://gist.github.com/statgeek/e0903d269d4a71316a4e

sivakoya · Posted 09-29-2016 07:48 PM

thanks. will try that.

tried this:

PROC TABULATE

DATA=WORK.TEST

;

CLASS attribute1 / ORDER=UNFORMATTED MISSING;

CLASS attribute2 / ORDER=UNFORMATTED MISSING;

CLASS attribute3 / ORDER=UNFORMATTED MISSING;

TABLE /* Row Dimension */

attribute3

attribute1

attribute2,

/* Column Dimension */

attribute3

attribute1

attribute2;

;

FREQ BAD;

RUN;

but this gives. just need a way to supress proc tabulate output to just 1's.

attribute3 attribute1 attribute2
0 1 1 0 1

attribute3               2 .       2            1 1
0
1                        . 2       2             2 .
attribute1              2 2       4             3 1
1
attribute2               1 2       3             3 .
0
1                        1 .       1             . 1

PGStats · Posted 09-29-2016 10:58 PM

SQL and transpose can do this:

data have;
input ID   attribute1   attribute2   attribute3     BAD $;
datalines;
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N
;

proc transpose data=have out=havet name=att;
by id bad;
var attribute:;
run;

proc sql;
create table wantt as
select 
    a.att, 
    b.att as col, 
    case when a.att=b.att then . 
         else sum(a.col1 and b.col1) end as n,
    case when a.att=b.att then . 
         else sum(a.col1 and b.col1 and a.BAD="Y") / 
            (select count(*) from have) end as p format=percent7.2
from 
    havet as a inner join 
    havet as b on a.id=b.id
group by a.att, b.att;
quit;

proc transpose data=wantt out=want1(drop=_name_);
by att;
id col;
var n;
run;

proc transpose data=wantt out=want2(drop=_name_);
by att;
id col;
var p;
run;

PG

sivakoya · Posted 10-04-2016 08:24 AM

Thank you @PGStats

Ksharp · Posted 09-30-2016 12:29 AM

It is easy for IML.


data have;
input ID   attribute1   attribute2   attribute3     BAD $;
datalines;
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N
;

proc iml;
use have nobs nobs;
read all var{attribute1 attribute2 attribute3} into x[c=vnames];
read all var{bad};
close;

xpx=x`*x;
diag=do(1,ncol(xpx)##2,ncol(xpx)+1);
xpx[diag]=.;
print xpx[r=vnames c=vnames l=''];

idx=loc(bad='Y');
temp=x[idx,];
per=(temp`*temp)/nobs;
per[diag]=.;
print per[r=vnames c=vnames l='' f=percent8.2];

quit;

sivakoya · Posted 09-30-2016 08:41 AM

@Ksharp this is awesome! Thanks for your help on this.

rogerjdeangelis · Posted 10-03-2016 03:10 PM

Survival Probabilities for combinations of risk factors

for an R solution based on the SAS Forum IML solution see
https://listserv.uga.edu/cgi-bin/wa?A2=SAS-L;f541a941.1610a

Slightly reformated

HAVE I have a following dataset

ID   Male Smoker Over60  Survived
TED   1       0       0        1
JIM   1       1       0        0
TEX   1       0       1        1
LIZ   0       1       1        0
JOE   1       1       0        1 Out of all Male Smokers only Joe survived 1/8 12.5% survival rate
TIM   1       1       0        0
TAD   1       0       1        1
TOM   1       0       1        0


WANT

For Male Smokers Jim, Joe and Tim. Only Joe survived 1/8 = .125
For Males  over 60 Tex, Tad and Tom, Tex and Tad Survived = 2/8 = .25
For Smokers over 60. NO one survived  0.0


           MALE  SMOKER    OVER60
MALE         _   12.50%       25%  (TAD + TOM survived)/8 records
SMOKER  12.50%       _         0%
OVER60     25%       0%         _

SAS SOLUTION
============

* after normalizing you can generalize with many sas procedures
  beter solution then this one?;

data _null_;
  retain  survive1 survive2 survive3 0;
  format survive: percent7.1;
  infile datalines eof=dne;
  input ID $3.  Male Smoker Over60  Survived;
  grp1='Male_Smoker_Survive   ';survive1=survive1+male*smoker*survived;
  grp2='Male_Over60_Survive   ';survive2=survive2+male*over60*survived;
  grp3='Smoker_Over60_Survive ';survive3=survive3+smoker*over60*survived;
  return;
  dne:
     _n_=_n_-1;
     survive1=survive1/_n_;
     survive2=survive2/_n_;
     survive3=survive3/_n_;
     put              @10 "Male "    @20 "Smoker"  @30 "Over60"   ;
     put @1 "Male"    @10 "-"        @20 Survive1  @30 Survive2 ;
     put @1 "Smoker"  @10 Survive1   @20 "-"       @30 Survive3 ;
     put @1 "Over60"  @10 Survive2   @20 Survive3  @30 "-"        ;
datalines;
TED 1 0 0 1
JIM 1 1 0 0
TEX 1 0 1 1
LIZ 0 1 1 0
JOE 1 1 0 1
TIM 1 1 0 0
TAD 1 0 1 1
TOM 1 0 1 0
;;;;
run;quit;

         Male      Smoker    Over60
Male     -         12.5%     25.0%
Smoker   12.5%     -         0.0%
Over60   25.0%     0.0%      -

sivakoya · Posted 10-04-2016 04:41 PM

@Ksharp

idx=loc(bad='Y');
temp=x[idx,];

Hi, If the dataset doesnt have any Bad='Y'. I am getting the below error.

MPRINT(STIP): temp=x[idx,];

ERROR: (execution) Matrix has not been set to a value.

Is there any work around to this?

Thanks

Ksharp · Posted 10-04-2016 11:08 PM


Oh. Sure. Try this one .


data have;
input ID   attribute1   attribute2   attribute3     BAD $;
datalines;
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N
;

proc iml;
use have nobs nobs;
read all var{attribute1 attribute2 attribute3} into x[c=vnames];
read all var{bad};
close;

xpx=x`*x;
diag=do(1,ncol(xpx)##2,ncol(xpx)+1);
xpx[diag]=.;
print xpx[r=vnames c=vnames l=''];

idx=loc(bad='Y');
if isempty(idx) then print "Bad don't have a Y";
else do;
temp=x[idx,];
per=(temp`*temp)/nobs;
per[diag]=.;
print per[r=vnames c=vnames l='' f=percent8.2];
end;
quit;

Create frequency count crosstab matrix and for each combination calculate percentages of target var

Re: Create frequency count crosstab matrix and for each combination calculate percentages of target

Re: Create frequency count crosstab matrix and for each combination calculate percentages of target

Re: Create frequency count crosstab matrix and for each combination calculate percentages of target

Re: Create frequency count crosstab matrix and for each combination calculate percentages of target

Re: Create frequency count crosstab matrix and for each combination calculate percentages of target

Re: Create frequency count crosstab matrix and for each combination calculate percentages of target

Re: Create frequency count crosstab matrix and for each combination calculate percentages of target

Re: Create frequency count crosstab matrix and for each combination calculate percentages of target

Re: Create frequency count crosstab matrix and for each combination calculate percentages of target

Re: Create frequency count crosstab matrix and for each combination calculate percentages of target

Re: Create frequency count crosstab matrix and for each combination calculate percentages of target

Re: Create frequency count crosstab matrix and for each combination calculate percentages of target

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away