@yoyong555:
Since your input is sorted by ID, both output data sets can be generated in a single step with 2 passes through the input data:
data have ;
input ID $ score ;
cards ;
1 10
1 10
2 5
3 20
3 21
3 20
4 15
4 18
5 .
6 8
6 8
7 17
7 17
7 25
;
run ;
data group (keep = id score group)
freq (keep = group freq)
;
do _n_ = 1 by 1 until (last.id) ;
set have ;
by id ;
if _n_ = 1 then _score1 = score ;
else if _score1 ne score then group = 2 ;
end ;
if _n_ = 1 then group = 3 ;
else if group ne 2 then group = 1 ;
array fq [3] _temporary_ ;
fq [group] + 1 ;
do _n_ = 1 to _n_ ;
set have end = lr ;
output group ;
end ;
if lr then do group = 1 to dim (fq) ;
freq = fq [group] ;
output freq ;
end ;
run ;
Each BY group is read twice:
On the first pass: SCORE in the second and subsequent records is compared with SCORE in the first record. If there's a difference, GROUP=2.
After the first pass (between the two DoW-loops): (A) the program looks at the number of records _N_ from the current BY group. If _N_=1, GROUP=3; otherwise if GROUP is not 2, all scores in the BY group are the same, so GROUP=1. (B) Now that the GROUP value for the current BY group has been found, 1 is added to the array item whose index is equal to GROUP.
On the second pass (the second DoW-loop): Every record from the current BY group is written to file GROUP with the value of GROUP added.
After the second pass (after the second DoW-loop): If the last record from the second input stream (related to the second SET) has just been read, LR (short for "last record") is auto-set to 1 from its initial automatic value of 0. In this case, the frequencies are gathered from the array and written to file FREQ.
Program control goes back to the top of the step, enters the first DoW-loop, and executed the SET statement. Since all the records from the first input stream (related to the first SET) have been read, its input buffer is empty. The attempt to read from an empty buffer causes the DATA step to terminate.
Simple!
Kind regards
Paul D.
... View more