Merge dataset with dates, but only keep most recent date from 2nd dataset

Reply
Occasional Contributor
Posts: 13

Merge dataset with dates, but only keep most recent date from 2nd dataset

I have two datasets, current -- which is my main file containing the visit_dates I want to match on, and previous which contains other visit_dates (not necessarily always prior to the current dataset).

 

I want to create a dataset that only keeps the most recent last_date prior to the visit_date in the current dataset. If there are no prior visits, then last_date should be empty.

 

What is the best way to approach this problem?

 

data current;
infile cards;
input id visit_date;
attrib visit_date length=4 informat=date9. format=ddmmyy10.;
cards;
1 01nov14 
1 11nov14
2 22nov15 
2 24nov15
3 01nov16 
4 01nov16
;
run;

data previous;
infile cards;
input id last_date;
attrib last_date length=4 informat=date9. format=ddmmyy10.;
cards;
1 01jan14 
1 11feb14
1 02jan15
1 01nov14
2 01nov15 
2 22nov15
2 23nov15
3 01nov16 
4 12nov16
;
run;

data want;
infile cards;
input id visit_date last_date;
attrib visit_date last_date length=4 informat=date9. format=ddmmyy10.;
cards;
1 01nov14 11feb14
1 11nov14 01nov14
2 22nov15 01nov15
2 24nov15 23nov15
3 01nov16 .
4 01nov16 .
;
run;

 

 

Frequent Learner
Posts: 1

Re: Merge dataset with dates, but only keep most recent date from 2nd dataset

Hi Fettah

 

Here's one way to do it: first I did a Cartesian Join (work.step1) via PROC SQL, then use the SAS data step by group processing to get the records you want. The results dataset has a new variable result_date which is the variable you want (just use a drop and rename, to drop last_date and rename result_date to last_date), I left that out to make it clearer to see how it is working.

 

Andy

 

 

proc sql ;
create table work.step1 as
select
current.*,
previous.*
from
work.current as current,
work.previous as previous
where
previous.id=current.id
order by
current.id,
current.visit_date,
previous.last_date
;
quit ;

 

data results ;
retain result_date ;
attrib result_date length=4 informat=date9. format=ddmmyy10.;
set work.step1 ;
by id visit_date ;
if first.visit_date then
result_date=. ;
if visit_date > last_date then
result_date=last_date ;
if last.visit_date then
output ;
run ;

PROC Star
Posts: 514

Re: Merge dataset with dates, but only keep most recent date from 2nd dataset

something like this

proc sql;
create table want as 
select  a.id, visit_date format=date7., max(last_date) as last_date format=date7.
from current a
left join
previous b
on a.id =b.id
and visit_date > last_date
group by 1,2;
PROC Star
Posts: 1,845

Re: Merge dataset with dates, but only keep most recent date from 2nd dataset

SQL is by far the best approach i think however hash is fun

data current;
infile cards;
input id visit_date;
attrib visit_date length=4 informat=date9. format=ddmmyy10.;
cards;
1 01nov14 
1 11nov14
2 22nov15 
2 24nov15
3 01nov16 
4 01nov16
;
run;

data previous;
infile cards;
input id last_date;
attrib last_date length=4 informat=date9. format=ddmmyy10.;
cards;
1 01jan14 
1 11feb14
1 02jan15
1 01nov14
2 01nov15 
2 22nov15
2 23nov15
3 01nov16 
4 12nov16
;
run;
data want;
if _n_=1 then do;
if 0 then do; set current;set previous;end;
  declare hash h(dataset: 'previous',multidata:'y');
  h.definekey('id');
  h.definedata('last_date');
  h.definedone();
 end;
 set current;
 rc=h.find();
 do while(rc = 0);
  if visit_date>last_date then  _t=max( _t, last_date);
  rc = h.find_next();
 end;
 last_date=_t;
 drop rc _:;
 run;
Ask a Question
Discussion stats
  • 3 replies
  • 137 views
  • 2 likes
  • 4 in conversation