Solved
Contributor
Posts: 44

# How to summarize similar data from multiple columns?

I have a data set that includes severl "defect" columns.  Up to three defects can be recorded in each row.  I'm struggling to write code that will give me a summary of occurrances of all "defects".  For example, take the following data:

0,0,0

1,0,0

1,2,0

5,0,1

0,0,0

0,1,0

0,0,0

5,0,0

0,0,9

0,0,0

There are 10 rows.  Defect number 1 (one) occurs 4 times (40%), defect 2 occurs 1 time (10%), defect 5 occurs 2 times (20%).  I've put together some SAS code that will summarize one column.  I'm having trouble figuring out how to do this for defects in multiple columns.  I also want to add a textual discription of each defect to the output table but I want to get the numbers correct first.  Here is some working sample code.  Can anyone steer me in the right direction?

%let nanimals=10;

data check3;

infile datalines dlm=',';

input defect1 defect2 defect3;

datalines;

0,0,0

1,0,0

1,2,0

5,0,1

0,0,0

0,1,0

0,0,0

5,0,0

0,0,9

0,0,0

;

proc sort; by defect1;

proc summary data=check3; by defect1;  where ((defect1>0 and defect1<20) or (defect2>0 and defect2<20) or (defect3>0 and defect3<20));

var defect2 defect3;

output out=check1 sum=defect2 defect3 n=total;

data check2; set check1;

percent=100*total/&nanimals;

format percent 5.1;

proc print noobs;

var total percent;

title Defect Summary;

run;

Accepted Solutions
Solution
‎03-08-2012 01:44 PM
Super Contributor
Posts: 1,636

## Re: How to summarize similar data from multiple columns?

another approach:

data check3;

infile datalines dlm=',';

input defect1 defect2 defect3;

datalines;

0,0,0

1,0,0

1,2,0

5,0,1

0,0,0

0,1,0

0,0,0

5,0,0

0,0,9

0,0,0

;

data temp(keep=n defect);

set check3 nobs=nobs;

array df(*) def:;

do _n_=1 to dim(df);

defect=df(_n_);

if defect>0 then  do;

n=nobs;

output; end;

end;

run;

proc sql;

create table want as

select distinct defect,count(*) as total,calculated total/n as ratio

from temp

group by defect;

quit;

proc print;run;

obs    defect    total    ratio

1        1        4       0.4

2        2        1       0.1

3        5        2       0.2

4        9        1       0.1

Linlin

All Replies
Posts: 3,167

## Re: How to summarize similar data from multiple columns?

You could try using Array() to first get the number:

data check3;

infile datalines dlm=',';

input defect1 defect2 defect3;

datalines;

0,0,0

1,0,0

1,2,0

5,0,1

0,0,0

0,1,0

0,0,0

5,0,0

0,0,9

0,0,0

;

data want (drop=defect;

set check3 end=last nobs=nobs;

array df (*) _numeric_;

do _n_=1 to dim(df);

d1+ifn(df(_n_)=1,1,0);

d2+ifn(df(_n_)=2,1,0);

d5+ifn(df(_n_)=5,1,0);

end;

if last then do;

d1_rate=d1/nobs;

d2_rate=d2/nobs;

d5_rate=d5/nobs;

output;

end;

run;

proc print;run;

Regards,

Haikuo

Super Contributor
Posts: 266

## Re: How to summarize similar data from multiple columns?

If that was an SQL data set, I would run this:

select defect, count(defect)/counter as defect_percent

from (

SELECT one as defect, (SELECT count(one) from defects) as counter from defects

union all

SELECT two as defect,  (SELECT count(two) from defects) from defects

union all

SELECT three as defect, (SELECT count(three) from defects) from defects

)

where defect>0

group by defect, counter

order by defect

which yeilds this:

defectdefect_percent
10.4
20.1
50.2
90.1
Query14

I'm not good enough at SAS yet to integrate that but maybe you are.  Just an idea.

Solution
‎03-08-2012 01:44 PM
Super Contributor
Posts: 1,636

## Re: How to summarize similar data from multiple columns?

another approach:

data check3;

infile datalines dlm=',';

input defect1 defect2 defect3;

datalines;

0,0,0

1,0,0

1,2,0

5,0,1

0,0,0

0,1,0

0,0,0

5,0,0

0,0,9

0,0,0

;

data temp(keep=n defect);

set check3 nobs=nobs;

array df(*) def:;

do _n_=1 to dim(df);

defect=df(_n_);

if defect>0 then  do;

n=nobs;

output; end;

end;

run;

proc sql;

create table want as

select distinct defect,count(*) as total,calculated total/n as ratio

from temp

group by defect;

quit;

proc print;run;

obs    defect    total    ratio

1        1        4       0.4

2        2        1       0.1

3        5        2       0.2

4        9        1       0.1

Linlin

Super User
Posts: 13,583

## Re: How to summarize similar data from multiple columns?

The text for descriptions is the easy part. Proc Format to the rescue!

Proc format;

value defect

0 = 'No defect'

1 = 'Text describing defect 1'

2 = 'Text describing defect 2'

(repeat as needed)

;

run;

Then in your final print or other report associate the defect variable(s) with the format.

format defect defect.;

Contributor
Posts: 44

## How to summarize similar data from multiple columns?

Thanks Ballardw.  I did not know how to do this using proc format.  Instead, I was merging tables of descriptions.

🔒 This topic is solved and locked.