Hi Guys,
I need to select the first date of a 2 variables combinations.
For example:
data:
number | pet | date
1 dog 12.11.2018
1 dog 18.6.2019
1 cat 1.1.2016
1 cat 6.3.2015
1 fish 2.2.2016
result wanted:
1 dog 12.11.2018
1 cat 6.3.2015
1 fish 2.2.2016
Please help,
Thank you!
Like this?
data have;
input number pet $ date:ddmmyy10.;
format date ddmmyy10.;
datalines;
1 dog 12.11.2018
1 dog 18.6.2019
1 cat 1.1.2016
1 cat 6.3.2015
1 fish 2.2.2016
;
proc sql;
create table want as
select * from have
group by number, pet
having date=min(date);
quit;
Like this?
data have;
input number pet $ date:ddmmyy10.;
format date ddmmyy10.;
datalines;
1 dog 12.11.2018
1 dog 18.6.2019
1 cat 1.1.2016
1 cat 6.3.2015
1 fish 2.2.2016
;
proc sql;
create table want as
select * from have
group by number, pet
having date=min(date);
quit;
Thanks a lot!
Alternatively by data step
proc sort data=have;
by number pet date;
run;
data want;
set have;
by number pet date;
if first.pet;
run;
This appears a nice question that can appreciate @Reeza 's diligence that eloquently entailed reasoning to various approaches FOR and AGAINST i.e remerge vs not to remerge in general in a thread ages ago
1. Need for a HAVING aka filter remerged content vs Direct summary
I suppose that's excellent forward thinking of @PeterClemmensen to envision the possibility of many variables besides grouping variable that would trigger a remerge and the need for HAVING.
Reeza covers this so well as she explains the double pass of SQL , Datastep that makes the inquisitive minded folks to think further.
Case 1. We have just grouping variable and analysis variable
Probable solutions: Proc sql direct summary, proc summary, means etc
proc sql; create table want as select number,pet, min(date) as date format=ddmmyy10. from have group by number,pet; quit;
Case 2. Grouping variable, analysis variables and other variables
Solutions: warrants HAVING coz the other will trigger remerge that requires grouped filter
proc sql; create table want as select * from have group by number, pet having date=min(date); quit;
Moving on to Datastep:
The Case1 would probably not require a sort assuming already sorted in some order as sample suggests we could just
data want ; do until(last.pet); set have(rename=date=_date); by number pet notsorted; date=min(date,_date); end; drop _:; format date ddmmyy10.; run;
However, again in Case 2 with many other vars will involve some gymnastics to park the associated variables of the min date somewhere and bring them back to output. Of course, this can be circumvented with various techniques without gymnastics with a double DOW/Interleave with SET and BY/ Parking in temp array/ Hash etc.
Nonetheless, the sorted approach by @Jagadishkatam is neat, convenient and easy to maintain.
Overall Long vs wide processing is the essence of the discussion. Cheers!
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.