This appears a nice question that can appreciate @Reeza 's diligence that eloquently entailed reasoning to various approaches FOR and AGAINST i.e remerge vs not to remerge in general in a thread ages ago
1. Need for a HAVING aka filter remerged content vs Direct summary
I suppose that's excellent forward thinking of @PeterClemmensen to envision the possibility of many variables besides grouping variable that would trigger a remerge and the need for HAVING.
Reeza covers this so well as she explains the double pass of SQL , Datastep that makes the inquisitive minded folks to think further.
Case 1. We have just grouping variable and analysis variable
Probable solutions: Proc sql direct summary, proc summary, means etc
proc sql;
create table want as
select number,pet, min(date) as date format=ddmmyy10.
from have
group by number,pet;
quit;
Case 2. Grouping variable, analysis variables and other variables
Solutions: warrants HAVING coz the other will trigger remerge that requires grouped filter
proc sql;
create table want as
select * from have
group by number, pet
having date=min(date);
quit;
Moving on to Datastep:
The Case1 would probably not require a sort assuming already sorted in some order as sample suggests we could just
data want ;
do until(last.pet);
set have(rename=date=_date);
by number pet notsorted;
date=min(date,_date);
end;
drop _:;
format date ddmmyy10.;
run;
However, again in Case 2 with many other vars will involve some gymnastics to park the associated variables of the min date somewhere and bring them back to output. Of course, this can be circumvented with various techniques without gymnastics with a double DOW/Interleave with SET and BY/ Parking in temp array/ Hash etc.
Nonetheless, the sorted approach by @Jagadishkatam is neat, convenient and easy to maintain.
Overall Long vs wide processing is the essence of the discussion. Cheers!
... View more