I have a data set that includes severl "defect" columns. Up to three defects can be recorded in each row. I'm struggling to write code that will give me a summary of occurrances of all "defects". For example, take the following data:
0,0,0
1,0,0
1,2,0
5,0,1
0,0,0
0,1,0
0,0,0
5,0,0
0,0,9
0,0,0
There are 10 rows. Defect number 1 (one) occurs 4 times (40%), defect 2 occurs 1 time (10%), defect 5 occurs 2 times (20%). I've put together some SAS code that will summarize one column. I'm having trouble figuring out how to do this for defects in multiple columns. I also want to add a textual discription of each defect to the output table but I want to get the numbers correct first. Here is some working sample code. Can anyone steer me in the right direction?
%let nanimals=10;
data check3;
infile datalines dlm=',';
input defect1 defect2 defect3;
datalines;
0,0,0
1,0,0
1,2,0
5,0,1
0,0,0
0,1,0
0,0,0
5,0,0
0,0,9
0,0,0
;
proc sort; by defect1;
proc summary data=check3; by defect1; where ((defect1>0 and defect1<20) or (defect2>0 and defect2<20) or (defect3>0 and defect3<20));
var defect2 defect3;
output out=check1 sum=defect2 defect3 n=total;
data check2; set check1;
percent=100*total/&nanimals;
format percent 5.1;
proc print noobs;
var total percent;
title Defect Summary;
run;
another approach:
data check3;
infile datalines dlm=',';
input defect1 defect2 defect3;
datalines;
0,0,0
1,0,0
1,2,0
5,0,1
0,0,0
0,1,0
0,0,0
5,0,0
0,0,9
0,0,0
;
data temp(keep=n defect);
set check3 nobs=nobs;
array df(*) def:;
do _n_=1 to dim(df);
defect=df(_n_);
if defect>0 then do;
n=nobs;
output; end;
end;
run;
proc sql;
create table want as
select distinct defect,count(*) as total,calculated total/n as ratio
from temp
group by defect;
quit;
proc print;run;
obs defect total ratio
1 1 4 0.4
2 2 1 0.1
3 5 2 0.2
4 9 1 0.1
Linlin
You could try using Array() to first get the number:
data check3;
infile datalines dlm=',';
input defect1 defect2 defect3;
datalines;
0,0,0
1,0,0
1,2,0
5,0,1
0,0,0
0,1,0
0,0,0
5,0,0
0,0,9
0,0,0
;
data want (drop=defect:);
set check3 end=last nobs=nobs;
array df (*) _numeric_;
do _n_=1 to dim(df);
d1+ifn(df(_n_)=1,1,0);
d2+ifn(df(_n_)=2,1,0);
d5+ifn(df(_n_)=5,1,0);
end;
if last then do;
d1_rate=d1/nobs;
d2_rate=d2/nobs;
d5_rate=d5/nobs;
output;
end;
run;
proc print;run;
Regards,
Haikuo
Edit to add the ratio.
If that was an SQL data set, I would run this:
select defect, count(defect)/counter as defect_percent
from (
SELECT one as defect, (SELECT count(one) from defects) as counter from defects
union all
SELECT two as defect, (SELECT count(two) from defects) from defects
union all
SELECT three as defect, (SELECT count(three) from defects) from defects
)
where defect>0
group by defect, counter
order by defect
which yeilds this:
defect | defect_percent |
---|---|
1 | 0.4 |
2 | 0.1 |
5 | 0.2 |
9 | 0.1 |
I'm not good enough at SAS yet to integrate that but maybe you are. Just an idea.
another approach:
data check3;
infile datalines dlm=',';
input defect1 defect2 defect3;
datalines;
0,0,0
1,0,0
1,2,0
5,0,1
0,0,0
0,1,0
0,0,0
5,0,0
0,0,9
0,0,0
;
data temp(keep=n defect);
set check3 nobs=nobs;
array df(*) def:;
do _n_=1 to dim(df);
defect=df(_n_);
if defect>0 then do;
n=nobs;
output; end;
end;
run;
proc sql;
create table want as
select distinct defect,count(*) as total,calculated total/n as ratio
from temp
group by defect;
quit;
proc print;run;
obs defect total ratio
1 1 4 0.4
2 2 1 0.1
3 5 2 0.2
4 9 1 0.1
Linlin
The text for descriptions is the easy part. Proc Format to the rescue!
Proc format;
value defect
0 = 'No defect'
1 = 'Text describing defect 1'
2 = 'Text describing defect 2'
(repeat as needed)
;
run;
Then in your final print or other report associate the defect variable(s) with the format.
format defect defect.;
Thanks Ballardw. I did not know how to do this using proc format. Instead, I was merging tables of descriptions.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.