Solved: Re: How to summarize similar data from multiple columns?

WesBarris · Posted 03-08-2012 12:14 PM

I have a data set that includes severl "defect" columns. Up to three defects can be recorded in each row. I'm struggling to write code that will give me a summary of occurrances of all "defects". For example, take the following data:

0,0,0

1,0,0

1,2,0

5,0,1

0,0,0

0,1,0

0,0,0

5,0,0

0,0,9

0,0,0

There are 10 rows. Defect number 1 (one) occurs 4 times (40%), defect 2 occurs 1 time (10%), defect 5 occurs 2 times (20%). I've put together some SAS code that will summarize one column. I'm having trouble figuring out how to do this for defects in multiple columns. I also want to add a textual discription of each defect to the output table but I want to get the numbers correct first. Here is some working sample code. Can anyone steer me in the right direction?

%let nanimals=10;

data check3;

infile datalines dlm=',';

input defect1 defect2 defect3;

datalines;

0,0,0

1,0,0

1,2,0

5,0,1

0,0,0

0,1,0

0,0,0

5,0,0

0,0,9

0,0,0

;

proc sort; by defect1;

proc summary data=check3; by defect1; where ((defect1>0 and defect1<20) or (defect2>0 and defect2<20) or (defect3>0 and defect3<20));

var defect2 defect3;

output out=check1 sum=defect2 defect3 n=total;

data check2; set check1;

percent=100*total/&nanimals;

format percent 5.1;

proc print noobs;

var total percent;

title Defect Summary;

run;

Linlin · Posted 03-08-2012 01:44 PM

another approach:

data check3;

infile datalines dlm=',';

input defect1 defect2 defect3;

datalines;

0,0,0

1,0,0

1,2,0

5,0,1

0,0,0

0,1,0

0,0,0

5,0,0

0,0,9

0,0,0

;

data temp(keep=n defect);

set check3 nobs=nobs;

array df(*) def:;

do _n_=1 to dim(df);

defect=df(_n_);

if defect>0 then do;

n=nobs;

output; end;

end;

run;

proc sql;

create table want as

select distinct defect,count(*) as total,calculated total/n as ratio

from temp

group by defect;

quit;

proc print;run;

obs defect total ratio

1 1 4 0.4

2 2 1 0.1

3 5 2 0.2

4 9 1 0.1

Linlin

View solution in original post

Haikuo · Posted 03-08-2012 12:25 PM

You could try using Array() to first get the number:

data check3;

infile datalines dlm=',';

input defect1 defect2 defect3;

datalines;

0,0,0

1,0,0

1,2,0

5,0,1

0,0,0

0,1,0

0,0,0

5,0,0

0,0,9

0,0,0

;

data want (drop=defect:);

set check3 end=last nobs=nobs;

array df (*) _numeric_;

do _n_=1 to dim(df);

d1+ifn(df(_n_)=1,1,0);

d2+ifn(df(_n_)=2,1,0);

d5+ifn(df(_n_)=5,1,0);

end;

if last then do;

d1_rate=d1/nobs;

d2_rate=d2/nobs;

d5_rate=d5/nobs;

output;

end;

run;

proc print;run;

Regards,

Haikuo

Edit to add the ratio.

HB · Posted 03-08-2012 01:14 PM

If that was an SQL data set, I would run this:

select defect, count(defect)/counter as defect_percent

from (

SELECT one as defect, (SELECT count(one) from defects) as counter from defects

union all

SELECT two as defect, (SELECT count(two) from defects) from defects

union all

SELECT three as defect, (SELECT count(three) from defects) from defects

)

where defect>0

group by defect, counter

order by defect

which yeilds this:

**Query14**
defect	defect_percent
1	0.4
2	0.1
5	0.2
9	0.1

I'm not good enough at SAS yet to integrate that but maybe you are. Just an idea.

Linlin · Posted 03-08-2012 01:44 PM

another approach:

data check3;

infile datalines dlm=',';

input defect1 defect2 defect3;

datalines;

0,0,0

1,0,0

1,2,0

5,0,1

0,0,0

0,1,0

0,0,0

5,0,0

0,0,9

0,0,0

;

data temp(keep=n defect);

set check3 nobs=nobs;

array df(*) def:;

do _n_=1 to dim(df);

defect=df(_n_);

if defect>0 then do;

n=nobs;

output; end;

end;

run;

proc sql;

create table want as

select distinct defect,count(*) as total,calculated total/n as ratio

from temp

group by defect;

quit;

proc print;run;

obs defect total ratio

1 1 4 0.4

2 2 1 0.1

3 5 2 0.2

4 9 1 0.1

Linlin

ballardw · Posted 03-08-2012 01:55 PM

The text for descriptions is the easy part. Proc Format to the rescue!

Proc format;

value defect

0 = 'No defect'

1 = 'Text describing defect 1'

2 = 'Text describing defect 2'

(repeat as needed)

;

run;

Then in your final print or other report associate the defect variable(s) with the format.

format defect defect.;

WesBarris · Posted 03-08-2012 03:26 PM

Thanks Ballardw. I did not know how to do this using proc format. Instead, I was merging tables of descriptions.

SAS Innovate 2025: Call for Content

Classroom Training Available!