BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
efi88
Fluorite | Level 6

Hi Guys,

I need to select the first date of a 2 variables combinations.

For example:

data:

number | pet | date

1 dog  12.11.2018

1 dog 18.6.2019

1 cat 1.1.2016

1 cat 6.3.2015

1 fish  2.2.2016

 

result wanted:

1 dog  12.11.2018

1 cat 6.3.2015

1 fish  2.2.2016

 

Please help,

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Like this?

 

data have;
input number pet $ date:ddmmyy10.;
format date ddmmyy10.;
datalines;
1 dog  12.11.2018
1 dog 18.6.2019
1 cat 1.1.2016
1 cat 6.3.2015
1 fish  2.2.2016
;

proc sql;
   create table want as
   select * from have
   group by number, pet
   having date=min(date);
quit;

View solution in original post

4 REPLIES 4
PeterClemmensen
Tourmaline | Level 20

Like this?

 

data have;
input number pet $ date:ddmmyy10.;
format date ddmmyy10.;
datalines;
1 dog  12.11.2018
1 dog 18.6.2019
1 cat 1.1.2016
1 cat 6.3.2015
1 fish  2.2.2016
;

proc sql;
   create table want as
   select * from have
   group by number, pet
   having date=min(date);
quit;
Jagadishkatam
Amethyst | Level 16

Alternatively by data step

 

proc sort data=have;
by number pet date;
run;

data want;
set have;
by number pet date;
if first.pet;
run;
Thanks,
Jag
novinosrin
Tourmaline | Level 20

This appears a nice question that can appreciate @Reeza  's diligence that eloquently entailed reasoning to various approaches FOR and AGAINST i.e remerge vs not to remerge in general in a thread ages ago

 

 

1. Need for a HAVING aka filter remerged content vs Direct summary 

 

I suppose that's excellent  forward thinking of @PeterClemmensen  to envision the possibility of many variables besides grouping variable that would trigger a remerge and the need for HAVING. 

 

Reeza covers this so well as she explains the double pass of SQL , Datastep that makes the inquisitive minded folks to think further. 

 

Case 1. We have just grouping variable and analysis variable

Probable solutions: Proc sql direct summary, proc summary, means etc

 

proc sql;
create table want as
select number,pet, min(date) as date format=ddmmyy10.
from have
group by number,pet;
quit;

 

Case 2. Grouping variable, analysis variables and other variables

Solutions: warrants HAVING coz  the other will trigger remerge that requires grouped filter

proc sql;
   create table want as
   select * from have
   group by number, pet
   having date=min(date);
quit;

 

Moving on to Datastep:

 

The Case1 would probably not require a sort assuming already sorted in some order as sample suggests we could just

data want ;
do until(last.pet);
set have(rename=date=_date);
by number pet notsorted;
date=min(date,_date);
end;
drop _:;
format date ddmmyy10.;
run;

However, again in Case 2 with many other vars will involve some gymnastics to park the associated variables of the min date somewhere and bring them back to output.  Of course, this can be circumvented with various techniques without gymnastics with a double DOW/Interleave with SET and BY/ Parking in temp array/ Hash etc. 

 

Nonetheless, the sorted approach by @Jagadishkatam  is neat, convenient and easy to maintain. 

 

Overall Long vs wide processing is the essence of the discussion. Cheers!

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 550 views
  • 2 likes
  • 4 in conversation