I have two datasets, current -- which is my main file containing the visit_dates I want to match on, and previous which contains other visit_dates (not necessarily always prior to the current dataset).
I want to create a dataset that only keeps the most recent last_date prior to the visit_date in the current dataset. If there are no prior visits, then last_date should be empty.
What is the best way to approach this problem?
data current; infile cards; input id visit_date; attrib visit_date length=4 informat=date9. format=ddmmyy10.; cards; 1 01nov14 1 11nov14 2 22nov15 2 24nov15 3 01nov16 4 01nov16 ; run; data previous; infile cards; input id last_date; attrib last_date length=4 informat=date9. format=ddmmyy10.; cards; 1 01jan14 1 11feb14 1 02jan15 1 01nov14 2 01nov15 2 22nov15 2 23nov15 3 01nov16 4 12nov16 ; run; data want; infile cards; input id visit_date last_date; attrib visit_date last_date length=4 informat=date9. format=ddmmyy10.; cards; 1 01nov14 11feb14 1 11nov14 01nov14 2 22nov15 01nov15 2 24nov15 23nov15 3 01nov16 . 4 01nov16 . ; run;
Hi Fettah
Here's one way to do it: first I did a Cartesian Join (work.step1) via PROC SQL, then use the SAS data step by group processing to get the records you want. The results dataset has a new variable result_date which is the variable you want (just use a drop and rename, to drop last_date and rename result_date to last_date), I left that out to make it clearer to see how it is working.
Andy
proc sql ;
create table work.step1 as
select
current.*,
previous.*
from
work.current as current,
work.previous as previous
where
previous.id=current.id
order by
current.id,
current.visit_date,
previous.last_date
;
quit ;
data results ;
retain result_date ;
attrib result_date length=4 informat=date9. format=ddmmyy10.;
set work.step1 ;
by id visit_date ;
if first.visit_date then
result_date=. ;
if visit_date > last_date then
result_date=last_date ;
if last.visit_date then
output ;
run ;
something like this
proc sql;
create table want as
select a.id, visit_date format=date7., max(last_date) as last_date format=date7.
from current a
left join
previous b
on a.id =b.id
and visit_date > last_date
group by 1,2;
SQL is by far the best approach i think however hash is fun
data current;
infile cards;
input id visit_date;
attrib visit_date length=4 informat=date9. format=ddmmyy10.;
cards;
1 01nov14
1 11nov14
2 22nov15
2 24nov15
3 01nov16
4 01nov16
;
run;
data previous;
infile cards;
input id last_date;
attrib last_date length=4 informat=date9. format=ddmmyy10.;
cards;
1 01jan14
1 11feb14
1 02jan15
1 01nov14
2 01nov15
2 22nov15
2 23nov15
3 01nov16
4 12nov16
;
run;
data want;
if _n_=1 then do;
if 0 then do; set current;set previous;end;
declare hash h(dataset: 'previous',multidata:'y');
h.definekey('id');
h.definedata('last_date');
h.definedone();
end;
set current;
rc=h.find();
do while(rc = 0);
if visit_date>last_date then _t=max( _t, last_date);
rc = h.find_next();
end;
last_date=_t;
drop rc _:;
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.