Note that in the rows 2-3 of your proposed output the values of W and S are swapped. Assuming that it's a typo, you've already been offered a solution. One problem with it is that the output variables W, S, P are hard coded. It's fine when you know that is all you have. But in the real world with the data structured like yours - when the information about output metadata is embedded into the data, the aggregate variable names cannot be determined without a preliminary pass through the input: Doing otherwise would mean making assumption about the data. One way of avoiding this rather precarious practice is to analyze the input data set first by reading it in full, find out what the distinct prefixes in the values of the variables aa--dd are and then hard code them into a conditional structure, such as if-then-else. Another, more dynamic, approach is to let SAS do the job as shown below. This way, you don't care what the prefixes are: They are discovered in the first pass, and in the ensuing aggregating pass, the compiler knows the output variable names and types automatically. data have ;
input aa$ bb$ cc$ dd$ c1 c2 c3 c4 ;
cards ;
w1 p2 s4 w3 10 45 14 5
s1 s3 w2 p4 20 14 15 84
p2 w4 w5 w4 12 18 45 7
s1 w3 p5 s2 23 56 87 44
run ;
/* 1st pass: Find output vnames for 2nd pass */
data _null_ ;
do until (z) ;
set have end = z ;
array ch _char_ ;
length vv $ 32767 ;
do over ch ;
v = compress (ch, , "d") ;
if not findw (vv, cats (v)) then vv = catx (" ", vv, v) ;
end ;
end ;
call symputx ("vv", vv) ;
run ;
/* 2nd pass: Aggregate into var list &vv */
data want (keep = &vv) ;
set have ;
array ch _char_ ;
array nn _numeric_ ;
array vv (j) &vv ;
do over vv ; vv = 0 ;
do over ch ;
if ch =: vname (vv) then vv = sum (vv, nn) ;
end ;
end ;
run ; The correct output will look as follows: W P S
----------
15 45 14
15 84 34
70 12 0
56 87 67 Paul D.
... View more