BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sivakoya
Obsidian | Level 7

I have a following dataset

 

ID   attribute1   attribute2   attribute3     BAD (Y-> yes)
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N

 

I want to create frequency count matrix, for each combination of attribute tripped (atrribute = 1 means tripped) like following:

 

                          attribute1    attribute2    attribute3
attribute1                  _                3                  3
attribute2                 3                 _                 1
attribute3                 3                 1                  _

 

It represents counts of attributes tripped together.

Once i have the counts I need to calcualte %BAD in each combination like below.

 


                      attribute1     attribute2     attribute3
attribute1             _               12.50%         25%
attribute2        12.50%             _                   0%
attribute3            25%              0%                _

 

it represents out of 3 ID's for the combination atribute1 and atribute2 tripped, one ID went BAD -> so 12.5% BAD

And out of 3 ID's for the combination attribute1 and attribute1 tripped, two went BAD -> so 25% BAD

And the 1 ID for the combination attribute2 and attribute3 tripped, didnt went BAD -> so 0% BAD

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
It is easy for IML.


data have;
input ID   attribute1   attribute2   attribute3     BAD $;
datalines;
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N
;

proc iml;
use have nobs nobs;
read all var{attribute1 attribute2 attribute3} into x[c=vnames];
read all var{bad};
close;

xpx=x`*x;
diag=do(1,ncol(xpx)##2,ncol(xpx)+1);
xpx[diag]=.;
print xpx[r=vnames c=vnames l=''];

idx=loc(bad='Y');
temp=x[idx,];
per=(temp`*temp)/nobs;
per[diag]=.;
print per[r=vnames c=vnames l='' f=percent8.2];

quit;


View solution in original post

11 REPLIES 11
Reeza
Super User

For part 1 you can use the output from Proc Corr - though you'll have to set the diagonal to missing manually.

 

data have;
input ID   var1 var2 var3   BAD $;
cards;
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N
;
run;

ods output sscp=distance;
proc corr data=have sscp;
var var1-var3;
run;
proc print data=distance;
run;

https://gist.github.com/statgeek/a5184a4e1678d81e2643

sivakoya
Obsidian | Level 7

Thank you for the solution. Any ideas on the second part?

Reeza
Super User

Nothing simple. 

Perhaps proc freq?

 

Here's a simple example but would need modification for the two way table.

https://gist.github.com/statgeek/e0903d269d4a71316a4e

 

sivakoya
Obsidian | Level 7

thanks. will try that.

 

tried this:

 

PROC TABULATE

DATA=WORK.TEST

;

CLASS attribute1 / ORDER=UNFORMATTED MISSING;

CLASS attribute2 / ORDER=UNFORMATTED MISSING;

CLASS attribute3 / ORDER=UNFORMATTED MISSING;

TABLE /* Row Dimension */

attribute3

attribute1

attribute2,

/* Column Dimension */

attribute3

attribute1

attribute2;

;

FREQ BAD;

RUN;

 

but this gives. just need a way to supress proc tabulate output to just 1's.

 

                  attribute3   attribute1  attribute2 
                         0 1       1             0 1

 

attribute3               2 .       2             1 1
0     
1                        . 2       2             2 .
attribute1               2 2       4             3 1
1      
attribute2               1 2       3             3 .
0     
1                        1 .       1             . 1

 

     
     
     
     
     
PGStats
Opal | Level 21

SQL and transpose can do this:

 

data have;
input ID   attribute1   attribute2   attribute3     BAD $;
datalines;
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N
;

proc transpose data=have out=havet name=att;
by id bad;
var attribute:;
run;

proc sql;
create table wantt as
select 
    a.att, 
    b.att as col, 
    case when a.att=b.att then . 
         else sum(a.col1 and b.col1) end as n,
    case when a.att=b.att then . 
         else sum(a.col1 and b.col1 and a.BAD="Y") / 
            (select count(*) from have) end as p format=percent7.2
from 
    havet as a inner join 
    havet as b on a.id=b.id
group by a.att, b.att;
quit;

proc transpose data=wantt out=want1(drop=_name_);
by att;
id col;
var n;
run;

proc transpose data=wantt out=want2(drop=_name_);
by att;
id col;
var p;
run;

PG
Ksharp
Super User
It is easy for IML.


data have;
input ID   attribute1   attribute2   attribute3     BAD $;
datalines;
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N
;

proc iml;
use have nobs nobs;
read all var{attribute1 attribute2 attribute3} into x[c=vnames];
read all var{bad};
close;

xpx=x`*x;
diag=do(1,ncol(xpx)##2,ncol(xpx)+1);
xpx[diag]=.;
print xpx[r=vnames c=vnames l=''];

idx=loc(bad='Y');
temp=x[idx,];
per=(temp`*temp)/nobs;
per[diag]=.;
print per[r=vnames c=vnames l='' f=percent8.2];

quit;


sivakoya
Obsidian | Level 7

@Ksharp this is awesome! Thanks for your help on this.

rogerjdeangelis
Barite | Level 11
Survival Probabilities for combinations of risk factors

for an R solution based on the SAS Forum IML solution see
https://listserv.uga.edu/cgi-bin/wa?A2=SAS-L;f541a941.1610a

Slightly reformated

HAVE I have a following dataset

ID   Male Smoker Over60  Survived
TED   1       0       0        1
JIM   1       1       0        0
TEX   1       0       1        1
LIZ   0       1       1        0
JOE   1       1       0        1 Out of all Male Smokers only Joe survived 1/8 12.5% survival rate
TIM   1       1       0        0
TAD   1       0       1        1
TOM   1       0       1        0


WANT

For Male Smokers Jim, Joe and Tim. Only Joe survived 1/8 = .125
For Males  over 60 Tex, Tad and Tom, Tex and Tad Survived = 2/8 = .25
For Smokers over 60. NO one survived  0.0


           MALE  SMOKER    OVER60
MALE         _   12.50%       25%  (TAD + TOM survived)/8 records
SMOKER  12.50%       _         0%
OVER60     25%       0%         _

SAS SOLUTION
============

* after normalizing you can generalize with many sas procedures
  beter solution then this one?;

data _null_;
  retain  survive1 survive2 survive3 0;
  format survive: percent7.1;
  infile datalines eof=dne;
  input ID $3.  Male Smoker Over60  Survived;
  grp1='Male_Smoker_Survive   ';survive1=survive1+male*smoker*survived;
  grp2='Male_Over60_Survive   ';survive2=survive2+male*over60*survived;
  grp3='Smoker_Over60_Survive ';survive3=survive3+smoker*over60*survived;
  return;
  dne:
     _n_=_n_-1;
     survive1=survive1/_n_;
     survive2=survive2/_n_;
     survive3=survive3/_n_;
     put              @10 "Male "    @20 "Smoker"  @30 "Over60"   ;
     put @1 "Male"    @10 "-"        @20 Survive1  @30 Survive2 ;
     put @1 "Smoker"  @10 Survive1   @20 "-"       @30 Survive3 ;
     put @1 "Over60"  @10 Survive2   @20 Survive3  @30 "-"        ;
datalines;
TED 1 0 0 1
JIM 1 1 0 0
TEX 1 0 1 1
LIZ 0 1 1 0
JOE 1 1 0 1
TIM 1 1 0 0
TAD 1 0 1 1
TOM 1 0 1 0
;;;;
run;quit;

         Male      Smoker    Over60
Male     -         12.5%     25.0%
Smoker   12.5%     -         0.0%
Over60   25.0%     0.0%      -
sivakoya
Obsidian | Level 7

@Ksharp

 

idx=loc(bad='Y');
temp=x[idx,];

 

Hi, If the dataset doesnt have any Bad='Y'. I am getting the below error.

 

MPRINT(STIP): temp=x[idx,];

ERROR: (execution) Matrix has not been set to a value.

 

 

Is there any work around to this?

 

Thanks

Ksharp
Super User

Oh. Sure. Try this one .


data have;
input ID   attribute1   attribute2   attribute3     BAD $;
datalines;
1        1               0               0                 Y
2        1               1               0                 N
3        1               0               1                 Y
4        0               1               1                 N
5        1               1               0                 Y
6        1               1               0                 N
7        1               0               1                 Y
8        1               0               1                 N
;

proc iml;
use have nobs nobs;
read all var{attribute1 attribute2 attribute3} into x[c=vnames];
read all var{bad};
close;

xpx=x`*x;
diag=do(1,ncol(xpx)##2,ncol(xpx)+1);
xpx[diag]=.;
print xpx[r=vnames c=vnames l=''];

idx=loc(bad='Y');
if isempty(idx) then print "Bad don't have a Y";
else do;
temp=x[idx,];
per=(temp`*temp)/nobs;
per[diag]=.;
print per[r=vnames c=vnames l='' f=percent8.2];
end;
quit;



sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2702 views
  • 6 likes
  • 5 in conversation