I have a following dataset
ID attribute1 attribute2 attribute3 BAD (Y-> yes)
1 1 0 0 Y
2 1 1 0 N
3 1 0 1 Y
4 0 1 1 N
5 1 1 0 Y
6 1 1 0 N
7 1 0 1 Y
8 1 0 1 N
I want to create frequency count matrix, for each combination of attribute tripped (atrribute = 1 means tripped) like following:
attribute1 attribute2 attribute3
attribute1 _ 3 3
attribute2 3 _ 1
attribute3 3 1 _
It represents counts of attributes tripped together.
Once i have the counts I need to calcualte %BAD in each combination like below.
attribute1 attribute2 attribute3
attribute1 _ 12.50% 25%
attribute2 12.50% _ 0%
attribute3 25% 0% _
it represents out of 3 ID's for the combination atribute1 and atribute2 tripped, one ID went BAD -> so 12.5% BAD
And out of 3 ID's for the combination attribute1 and attribute1 tripped, two went BAD -> so 25% BAD
And the 1 ID for the combination attribute2 and attribute3 tripped, didnt went BAD -> so 0% BAD
Thanks
It is easy for IML. data have; input ID attribute1 attribute2 attribute3 BAD $; datalines; 1 1 0 0 Y 2 1 1 0 N 3 1 0 1 Y 4 0 1 1 N 5 1 1 0 Y 6 1 1 0 N 7 1 0 1 Y 8 1 0 1 N ; proc iml; use have nobs nobs; read all var{attribute1 attribute2 attribute3} into x[c=vnames]; read all var{bad}; close; xpx=x`*x; diag=do(1,ncol(xpx)##2,ncol(xpx)+1); xpx[diag]=.; print xpx[r=vnames c=vnames l='']; idx=loc(bad='Y'); temp=x[idx,]; per=(temp`*temp)/nobs; per[diag]=.; print per[r=vnames c=vnames l='' f=percent8.2]; quit;
For part 1 you can use the output from Proc Corr - though you'll have to set the diagonal to missing manually.
data have;
input ID var1 var2 var3 BAD $;
cards;
1 1 0 0 Y
2 1 1 0 N
3 1 0 1 Y
4 0 1 1 N
5 1 1 0 Y
6 1 1 0 N
7 1 0 1 Y
8 1 0 1 N
;
run;
ods output sscp=distance;
proc corr data=have sscp;
var var1-var3;
run;
proc print data=distance;
run;
Thank you for the solution. Any ideas on the second part?
Nothing simple.
Perhaps proc freq?
Here's a simple example but would need modification for the two way table.
https://gist.github.com/statgeek/e0903d269d4a71316a4e
thanks. will try that.
tried this:
PROC TABULATE
DATA=WORK.TEST
;
CLASS attribute1 / ORDER=UNFORMATTED MISSING;
CLASS attribute2 / ORDER=UNFORMATTED MISSING;
CLASS attribute3 / ORDER=UNFORMATTED MISSING;
TABLE /* Row Dimension */
attribute3
attribute1
attribute2,
/* Column Dimension */
attribute3
attribute1
attribute2;
;
FREQ BAD;
RUN;
but this gives. just need a way to supress proc tabulate output to just 1's.
attribute3 attribute1 attribute2
0 1 1 0 1
attribute3 2 . 2 1 1
0
1 . 2 2 2 .
attribute1 2 2 4 3 1
1
attribute2 1 2 3 3 .
0
1 1 . 1 . 1
SQL and transpose can do this:
data have;
input ID attribute1 attribute2 attribute3 BAD $;
datalines;
1 1 0 0 Y
2 1 1 0 N
3 1 0 1 Y
4 0 1 1 N
5 1 1 0 Y
6 1 1 0 N
7 1 0 1 Y
8 1 0 1 N
;
proc transpose data=have out=havet name=att;
by id bad;
var attribute:;
run;
proc sql;
create table wantt as
select
a.att,
b.att as col,
case when a.att=b.att then .
else sum(a.col1 and b.col1) end as n,
case when a.att=b.att then .
else sum(a.col1 and b.col1 and a.BAD="Y") /
(select count(*) from have) end as p format=percent7.2
from
havet as a inner join
havet as b on a.id=b.id
group by a.att, b.att;
quit;
proc transpose data=wantt out=want1(drop=_name_);
by att;
id col;
var n;
run;
proc transpose data=wantt out=want2(drop=_name_);
by att;
id col;
var p;
run;
Thank you @PGStats
It is easy for IML. data have; input ID attribute1 attribute2 attribute3 BAD $; datalines; 1 1 0 0 Y 2 1 1 0 N 3 1 0 1 Y 4 0 1 1 N 5 1 1 0 Y 6 1 1 0 N 7 1 0 1 Y 8 1 0 1 N ; proc iml; use have nobs nobs; read all var{attribute1 attribute2 attribute3} into x[c=vnames]; read all var{bad}; close; xpx=x`*x; diag=do(1,ncol(xpx)##2,ncol(xpx)+1); xpx[diag]=.; print xpx[r=vnames c=vnames l='']; idx=loc(bad='Y'); temp=x[idx,]; per=(temp`*temp)/nobs; per[diag]=.; print per[r=vnames c=vnames l='' f=percent8.2]; quit;
@Ksharp this is awesome! Thanks for your help on this.
Survival Probabilities for combinations of risk factors
for an R solution based on the SAS Forum IML solution see
https://listserv.uga.edu/cgi-bin/wa?A2=SAS-L;f541a941.1610a
Slightly reformated
HAVE I have a following dataset
ID Male Smoker Over60 Survived
TED 1 0 0 1
JIM 1 1 0 0
TEX 1 0 1 1
LIZ 0 1 1 0
JOE 1 1 0 1 Out of all Male Smokers only Joe survived 1/8 12.5% survival rate
TIM 1 1 0 0
TAD 1 0 1 1
TOM 1 0 1 0
WANT
For Male Smokers Jim, Joe and Tim. Only Joe survived 1/8 = .125
For Males over 60 Tex, Tad and Tom, Tex and Tad Survived = 2/8 = .25
For Smokers over 60. NO one survived 0.0
MALE SMOKER OVER60
MALE _ 12.50% 25% (TAD + TOM survived)/8 records
SMOKER 12.50% _ 0%
OVER60 25% 0% _
SAS SOLUTION
============
* after normalizing you can generalize with many sas procedures
beter solution then this one?;
data _null_;
retain survive1 survive2 survive3 0;
format survive: percent7.1;
infile datalines eof=dne;
input ID $3. Male Smoker Over60 Survived;
grp1='Male_Smoker_Survive ';survive1=survive1+male*smoker*survived;
grp2='Male_Over60_Survive ';survive2=survive2+male*over60*survived;
grp3='Smoker_Over60_Survive ';survive3=survive3+smoker*over60*survived;
return;
dne:
_n_=_n_-1;
survive1=survive1/_n_;
survive2=survive2/_n_;
survive3=survive3/_n_;
put @10 "Male " @20 "Smoker" @30 "Over60" ;
put @1 "Male" @10 "-" @20 Survive1 @30 Survive2 ;
put @1 "Smoker" @10 Survive1 @20 "-" @30 Survive3 ;
put @1 "Over60" @10 Survive2 @20 Survive3 @30 "-" ;
datalines;
TED 1 0 0 1
JIM 1 1 0 0
TEX 1 0 1 1
LIZ 0 1 1 0
JOE 1 1 0 1
TIM 1 1 0 0
TAD 1 0 1 1
TOM 1 0 1 0
;;;;
run;quit;
Male Smoker Over60
Male - 12.5% 25.0%
Smoker 12.5% - 0.0%
Over60 25.0% 0.0% -
idx=loc(bad='Y');
temp=x[idx,];
Hi, If the dataset doesnt have any Bad='Y'. I am getting the below error.
MPRINT(STIP): temp=x[idx,];
ERROR: (execution) Matrix has not been set to a value.
Is there any work around to this?
Thanks
Oh. Sure. Try this one . data have; input ID attribute1 attribute2 attribute3 BAD $; datalines; 1 1 0 0 Y 2 1 1 0 N 3 1 0 1 Y 4 0 1 1 N 5 1 1 0 Y 6 1 1 0 N 7 1 0 1 Y 8 1 0 1 N ; proc iml; use have nobs nobs; read all var{attribute1 attribute2 attribute3} into x[c=vnames]; read all var{bad}; close; xpx=x`*x; diag=do(1,ncol(xpx)##2,ncol(xpx)+1); xpx[diag]=.; print xpx[r=vnames c=vnames l='']; idx=loc(bad='Y'); if isempty(idx) then print "Bad don't have a Y"; else do; temp=x[idx,]; per=(temp`*temp)/nobs; per[diag]=.; print per[r=vnames c=vnames l='' f=percent8.2]; end; quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.