I have a following dataset
ID attribute1 attribute2 attribute3 BAD (Y-> yes)
1 1 0 0 Y
2 1 1 0 N
3 1 0 1 Y
4 0 1 1 N
5 1 1 0 Y
6 1 1 0 N
7 1 0 1 Y
8 1 0 1 N
I want to create frequency count matrix, for each combination of attribute tripped (atrribute = 1 means tripped) like following:
attribute1 attribute2 attribute3
attribute1 _ 3 3
attribute2 3 _ 1
attribute3 3 1 _
It represents counts of attributes tripped together.
Once i have the counts I need to calcualte %BAD in each combination like below.
attribute1 attribute2 attribute3
attribute1 _ 12.50% 25%
attribute2 12.50% _ 0%
attribute3 25% 0% _
it represents out of 3 ID's for the combination atribute1 and atribute2 tripped, one ID went BAD -> so 12.5% BAD
And out of 3 ID's for the combination attribute1 and attribute1 tripped, two went BAD -> so 25% BAD
And the 1 ID for the combination attribute2 and attribute3 tripped, didnt went BAD -> so 0% BAD
Thanks
It is easy for IML. data have; input ID attribute1 attribute2 attribute3 BAD $; datalines; 1 1 0 0 Y 2 1 1 0 N 3 1 0 1 Y 4 0 1 1 N 5 1 1 0 Y 6 1 1 0 N 7 1 0 1 Y 8 1 0 1 N ; proc iml; use have nobs nobs; read all var{attribute1 attribute2 attribute3} into x[c=vnames]; read all var{bad}; close; xpx=x`*x; diag=do(1,ncol(xpx)##2,ncol(xpx)+1); xpx[diag]=.; print xpx[r=vnames c=vnames l='']; idx=loc(bad='Y'); temp=x[idx,]; per=(temp`*temp)/nobs; per[diag]=.; print per[r=vnames c=vnames l='' f=percent8.2]; quit;
For part 1 you can use the output from Proc Corr - though you'll have to set the diagonal to missing manually.
data have;
input ID var1 var2 var3 BAD $;
cards;
1 1 0 0 Y
2 1 1 0 N
3 1 0 1 Y
4 0 1 1 N
5 1 1 0 Y
6 1 1 0 N
7 1 0 1 Y
8 1 0 1 N
;
run;
ods output sscp=distance;
proc corr data=have sscp;
var var1-var3;
run;
proc print data=distance;
run;
Thank you for the solution. Any ideas on the second part?
Nothing simple.
Perhaps proc freq?
Here's a simple example but would need modification for the two way table.
https://gist.github.com/statgeek/e0903d269d4a71316a4e
thanks. will try that.
tried this:
PROC TABULATE
DATA=WORK.TEST
;
CLASS attribute1 / ORDER=UNFORMATTED MISSING;
CLASS attribute2 / ORDER=UNFORMATTED MISSING;
CLASS attribute3 / ORDER=UNFORMATTED MISSING;
TABLE /* Row Dimension */
attribute3
attribute1
attribute2,
/* Column Dimension */
attribute3
attribute1
attribute2;
;
FREQ BAD;
RUN;
but this gives. just need a way to supress proc tabulate output to just 1's.
attribute3 attribute1 attribute2
0 1 1 0 1
attribute3 2 . 2 1 1
0
1 . 2 2 2 .
attribute1 2 2 4 3 1
1
attribute2 1 2 3 3 .
0
1 1 . 1 . 1
SQL and transpose can do this:
data have;
input ID attribute1 attribute2 attribute3 BAD $;
datalines;
1 1 0 0 Y
2 1 1 0 N
3 1 0 1 Y
4 0 1 1 N
5 1 1 0 Y
6 1 1 0 N
7 1 0 1 Y
8 1 0 1 N
;
proc transpose data=have out=havet name=att;
by id bad;
var attribute:;
run;
proc sql;
create table wantt as
select
a.att,
b.att as col,
case when a.att=b.att then .
else sum(a.col1 and b.col1) end as n,
case when a.att=b.att then .
else sum(a.col1 and b.col1 and a.BAD="Y") /
(select count(*) from have) end as p format=percent7.2
from
havet as a inner join
havet as b on a.id=b.id
group by a.att, b.att;
quit;
proc transpose data=wantt out=want1(drop=_name_);
by att;
id col;
var n;
run;
proc transpose data=wantt out=want2(drop=_name_);
by att;
id col;
var p;
run;
Thank you @PGStats
It is easy for IML. data have; input ID attribute1 attribute2 attribute3 BAD $; datalines; 1 1 0 0 Y 2 1 1 0 N 3 1 0 1 Y 4 0 1 1 N 5 1 1 0 Y 6 1 1 0 N 7 1 0 1 Y 8 1 0 1 N ; proc iml; use have nobs nobs; read all var{attribute1 attribute2 attribute3} into x[c=vnames]; read all var{bad}; close; xpx=x`*x; diag=do(1,ncol(xpx)##2,ncol(xpx)+1); xpx[diag]=.; print xpx[r=vnames c=vnames l='']; idx=loc(bad='Y'); temp=x[idx,]; per=(temp`*temp)/nobs; per[diag]=.; print per[r=vnames c=vnames l='' f=percent8.2]; quit;
@Ksharp this is awesome! Thanks for your help on this.
Survival Probabilities for combinations of risk factors
for an R solution based on the SAS Forum IML solution see
https://listserv.uga.edu/cgi-bin/wa?A2=SAS-L;f541a941.1610a
Slightly reformated
HAVE I have a following dataset
ID Male Smoker Over60 Survived
TED 1 0 0 1
JIM 1 1 0 0
TEX 1 0 1 1
LIZ 0 1 1 0
JOE 1 1 0 1 Out of all Male Smokers only Joe survived 1/8 12.5% survival rate
TIM 1 1 0 0
TAD 1 0 1 1
TOM 1 0 1 0
WANT
For Male Smokers Jim, Joe and Tim. Only Joe survived 1/8 = .125
For Males over 60 Tex, Tad and Tom, Tex and Tad Survived = 2/8 = .25
For Smokers over 60. NO one survived 0.0
MALE SMOKER OVER60
MALE _ 12.50% 25% (TAD + TOM survived)/8 records
SMOKER 12.50% _ 0%
OVER60 25% 0% _
SAS SOLUTION
============
* after normalizing you can generalize with many sas procedures
beter solution then this one?;
data _null_;
retain survive1 survive2 survive3 0;
format survive: percent7.1;
infile datalines eof=dne;
input ID $3. Male Smoker Over60 Survived;
grp1='Male_Smoker_Survive ';survive1=survive1+male*smoker*survived;
grp2='Male_Over60_Survive ';survive2=survive2+male*over60*survived;
grp3='Smoker_Over60_Survive ';survive3=survive3+smoker*over60*survived;
return;
dne:
_n_=_n_-1;
survive1=survive1/_n_;
survive2=survive2/_n_;
survive3=survive3/_n_;
put @10 "Male " @20 "Smoker" @30 "Over60" ;
put @1 "Male" @10 "-" @20 Survive1 @30 Survive2 ;
put @1 "Smoker" @10 Survive1 @20 "-" @30 Survive3 ;
put @1 "Over60" @10 Survive2 @20 Survive3 @30 "-" ;
datalines;
TED 1 0 0 1
JIM 1 1 0 0
TEX 1 0 1 1
LIZ 0 1 1 0
JOE 1 1 0 1
TIM 1 1 0 0
TAD 1 0 1 1
TOM 1 0 1 0
;;;;
run;quit;
Male Smoker Over60
Male - 12.5% 25.0%
Smoker 12.5% - 0.0%
Over60 25.0% 0.0% -
idx=loc(bad='Y');
temp=x[idx,];
Hi, If the dataset doesnt have any Bad='Y'. I am getting the below error.
MPRINT(STIP): temp=x[idx,];
ERROR: (execution) Matrix has not been set to a value.
Is there any work around to this?
Thanks
Oh. Sure. Try this one . data have; input ID attribute1 attribute2 attribute3 BAD $; datalines; 1 1 0 0 Y 2 1 1 0 N 3 1 0 1 Y 4 0 1 1 N 5 1 1 0 Y 6 1 1 0 N 7 1 0 1 Y 8 1 0 1 N ; proc iml; use have nobs nobs; read all var{attribute1 attribute2 attribute3} into x[c=vnames]; read all var{bad}; close; xpx=x`*x; diag=do(1,ncol(xpx)##2,ncol(xpx)+1); xpx[diag]=.; print xpx[r=vnames c=vnames l='']; idx=loc(bad='Y'); if isempty(idx) then print "Bad don't have a Y"; else do; temp=x[idx,]; per=(temp`*temp)/nobs; per[diag]=.; print per[r=vnames c=vnames l='' f=percent8.2]; end; quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.