BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sivakoya
Obsidian | Level 7

I have a following dataset

 

ID   attribute1   attribute2   attribute3     BAD (Y-> yes)
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N

 

I want to create frequency count matrix, for each combination of attribute tripped (atrribute = 1 means tripped) like following:

 

                          attribute1    attribute2    attribute3
attribute1                  _                3                  3
attribute2                 3                 _                 1
attribute3                 3                 1                  _

 

It represents counts of attributes tripped together.

Once i have the counts I need to calcualte %BAD in each combination like below.

 


                      attribute1     attribute2     attribute3
attribute1             _               12.50%         25%
attribute2        12.50%             _                   0%
attribute3            25%              0%                _

 

it represents out of 3 ID's for the combination atribute1 and atribute2 tripped, one ID went BAD -> so 12.5% BAD

And out of 3 ID's for the combination attribute1 and attribute1 tripped, two went BAD -> so 25% BAD

And the 1 ID for the combination attribute2 and attribute3 tripped, didnt went BAD -> so 0% BAD

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
It is easy for IML.


data have;
input ID   attribute1   attribute2   attribute3     BAD $;
datalines;
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N
;

proc iml;
use have nobs nobs;
read all var{attribute1 attribute2 attribute3} into x[c=vnames];
read all var{bad};
close;

xpx=x`*x;
diag=do(1,ncol(xpx)##2,ncol(xpx)+1);
xpx[diag]=.;
print xpx[r=vnames c=vnames l=''];

idx=loc(bad='Y');
temp=x[idx,];
per=(temp`*temp)/nobs;
per[diag]=.;
print per[r=vnames c=vnames l='' f=percent8.2];

quit;


View solution in original post

11 REPLIES 11
Reeza
Super User

For part 1 you can use the output from Proc Corr - though you'll have to set the diagonal to missing manually.

 

data have;
input ID   var1 var2 var3   BAD $;
cards;
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N
;
run;

ods output sscp=distance;
proc corr data=have sscp;
var var1-var3;
run;
proc print data=distance;
run;

https://gist.github.com/statgeek/a5184a4e1678d81e2643

sivakoya
Obsidian | Level 7

Thank you for the solution. Any ideas on the second part?

Reeza
Super User

Nothing simple. 

Perhaps proc freq?

 

Here's a simple example but would need modification for the two way table.

https://gist.github.com/statgeek/e0903d269d4a71316a4e

 

sivakoya
Obsidian | Level 7

thanks. will try that.

 

tried this:

 

PROC TABULATE

DATA=WORK.TEST

;

CLASS attribute1 / ORDER=UNFORMATTED MISSING;

CLASS attribute2 / ORDER=UNFORMATTED MISSING;

CLASS attribute3 / ORDER=UNFORMATTED MISSING;

TABLE /* Row Dimension */

attribute3

attribute1

attribute2,

/* Column Dimension */

attribute3

attribute1

attribute2;

;

FREQ BAD;

RUN;

 

but this gives. just need a way to supress proc tabulate output to just 1's.

 

                  attribute3   attribute1  attribute2 
                         0 1       1             0 1

 

attribute3               2 .       2             1 1
0     
1                        . 2       2             2 .
attribute1               2 2       4             3 1
1      
attribute2               1 2       3             3 .
0     
1                        1 .       1             . 1

 

     
     
     
     
     
PGStats
Opal | Level 21

SQL and transpose can do this:

 

data have;
input ID   attribute1   attribute2   attribute3     BAD $;
datalines;
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N
;

proc transpose data=have out=havet name=att;
by id bad;
var attribute:;
run;

proc sql;
create table wantt as
select 
    a.att, 
    b.att as col, 
    case when a.att=b.att then . 
         else sum(a.col1 and b.col1) end as n,
    case when a.att=b.att then . 
         else sum(a.col1 and b.col1 and a.BAD="Y") / 
            (select count(*) from have) end as p format=percent7.2
from 
    havet as a inner join 
    havet as b on a.id=b.id
group by a.att, b.att;
quit;

proc transpose data=wantt out=want1(drop=_name_);
by att;
id col;
var n;
run;

proc transpose data=wantt out=want2(drop=_name_);
by att;
id col;
var p;
run;

PG
Ksharp
Super User
It is easy for IML.


data have;
input ID   attribute1   attribute2   attribute3     BAD $;
datalines;
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N
;

proc iml;
use have nobs nobs;
read all var{attribute1 attribute2 attribute3} into x[c=vnames];
read all var{bad};
close;

xpx=x`*x;
diag=do(1,ncol(xpx)##2,ncol(xpx)+1);
xpx[diag]=.;
print xpx[r=vnames c=vnames l=''];

idx=loc(bad='Y');
temp=x[idx,];
per=(temp`*temp)/nobs;
per[diag]=.;
print per[r=vnames c=vnames l='' f=percent8.2];

quit;


sivakoya
Obsidian | Level 7

@Ksharp this is awesome! Thanks for your help on this.

rogerjdeangelis
Barite | Level 11
Survival Probabilities for combinations of risk factors

for an R solution based on the SAS Forum IML solution see
https://listserv.uga.edu/cgi-bin/wa?A2=SAS-L;f541a941.1610a

Slightly reformated

HAVE I have a following dataset

ID   Male Smoker Over60  Survived
TED   1       0       0        1
JIM   1       1       0        0
TEX   1       0       1        1
LIZ   0       1       1        0
JOE   1       1       0        1 Out of all Male Smokers only Joe survived 1/8 12.5% survival rate
TIM   1       1       0        0
TAD   1       0       1        1
TOM   1       0       1        0


WANT

For Male Smokers Jim, Joe and Tim. Only Joe survived 1/8 = .125
For Males  over 60 Tex, Tad and Tom, Tex and Tad Survived = 2/8 = .25
For Smokers over 60. NO one survived  0.0


           MALE  SMOKER    OVER60
MALE         _   12.50%       25%  (TAD + TOM survived)/8 records
SMOKER  12.50%       _         0%
OVER60     25%       0%         _

SAS SOLUTION
============

* after normalizing you can generalize with many sas procedures
  beter solution then this one?;

data _null_;
  retain  survive1 survive2 survive3 0;
  format survive: percent7.1;
  infile datalines eof=dne;
  input ID $3.  Male Smoker Over60  Survived;
  grp1='Male_Smoker_Survive   ';survive1=survive1+male*smoker*survived;
  grp2='Male_Over60_Survive   ';survive2=survive2+male*over60*survived;
  grp3='Smoker_Over60_Survive ';survive3=survive3+smoker*over60*survived;
  return;
  dne:
     _n_=_n_-1;
     survive1=survive1/_n_;
     survive2=survive2/_n_;
     survive3=survive3/_n_;
     put              @10 "Male "    @20 "Smoker"  @30 "Over60"   ;
     put @1 "Male"    @10 "-"        @20 Survive1  @30 Survive2 ;
     put @1 "Smoker"  @10 Survive1   @20 "-"       @30 Survive3 ;
     put @1 "Over60"  @10 Survive2   @20 Survive3  @30 "-"        ;
datalines;
TED 1 0 0 1
JIM 1 1 0 0
TEX 1 0 1 1
LIZ 0 1 1 0
JOE 1 1 0 1
TIM 1 1 0 0
TAD 1 0 1 1
TOM 1 0 1 0
;;;;
run;quit;

         Male      Smoker    Over60
Male     -         12.5%     25.0%
Smoker   12.5%     -         0.0%
Over60   25.0%     0.0%      -
sivakoya
Obsidian | Level 7

@Ksharp

 

idx=loc(bad='Y');
temp=x[idx,];

 

Hi, If the dataset doesnt have any Bad='Y'. I am getting the below error.

 

MPRINT(STIP): temp=x[idx,];

ERROR: (execution) Matrix has not been set to a value.

 

 

Is there any work around to this?

 

Thanks

Ksharp
Super User

Oh. Sure. Try this one .


data have;
input ID   attribute1   attribute2   attribute3     BAD $;
datalines;
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N
;

proc iml;
use have nobs nobs;
read all var{attribute1 attribute2 attribute3} into x[c=vnames];
read all var{bad};
close;

xpx=x`*x;
diag=do(1,ncol(xpx)##2,ncol(xpx)+1);
xpx[diag]=.;
print xpx[r=vnames c=vnames l=''];

idx=loc(bad='Y');
if isempty(idx) then print "Bad don't have a Y";
else do;
temp=x[idx,];
per=(temp`*temp)/nobs;
per[diag]=.;
print per[r=vnames c=vnames l='' f=percent8.2];
end;
quit;



sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3891 views
  • 6 likes
  • 5 in conversation