Dear SAS Experts,
Need your help in figuring out how to compare dates in 2 datasets having same ID for multiple observations and selecting the latest as I am bit new to SAS and still learning
Table 1 has 3 columns
ID No Date
1 1 4-Apr-18
1 2 19-May-18
1 3 20-Jul-18
1 4 01-Aug-18
'
Table 2 has 3 columns
ID No Date
1 1 2-May-18
1 2 10-Jun-18
1 3 15-Jul-18
For every record in table1 with the same ID , i need to compare the date with table2 having same ID and ensure t1.date >=t2.date.
If there multiple records satisfying the condition from table 2 then take the latest and create a column valid with values Y or N, if condition not satisfied
e.g. for record 3 in table1 for ID=1 all the records from table2 satisfy the t1.date >=t2.date. in this case i have to pick the 15-jul-18 against that record. and put valid flag as Y
Thanks for all the help in advance.
Or
data want;
if _n_=1 then do;
if 0 then set have2;
declare hash H (dataset:'have2(rename=(date=date1))',multidata:'y') ;
h.definekey ("id") ;
h.definedata ("date1") ;
h.definedone () ;
call missing(date1);
declare hash H2 (dataset:'have2(drop=no rename=(date=date2))',multidata:'y') ;
h2.definekey ("id",'date2') ;
h2.defineData(all: 'y');
h2.definedone () ;
call missing(date2);
end;
do until(last.id);
set have1;
by id;
call missing(date2,acc);
do rc=h.find() by 0 while(rc=0);
if date>=date1 then date2=max(date1,date2);
rc=h.find_next();
end;
rc=h2.find(key:id,key:date2);
valid=ifc(date2=.,'N','Y');
output;
end;
format date: date9.;
drop date1 rc;
run;
This assumes, you have sufficient memory to have two hash tables
Can you post a sample of your requird output for your input sample to avoid assumptions plz
The desired output
Table 1 with additional columns
ID No Date t2.date valid
1 1 4-Apr-18 - N
1 2 19-May-18 2-May Y
1 3 20-Jul-18 15-Jul Y
1 4 01-Aug-18 15-Jul Y
Thanks
data have1;
input ID No Date :date9.;
format date date9.;
cards;
1 1 4-Apr-18
1 2 19-May-18
1 3 20-Jul-18
1 4 01-Aug-18
;
data have2;
input ID No Date :date9.;
format date date9.;
cards;
1 1 2-May-18
1 2 10-Jun-18
1 3 15-Jul-18
;
proc sql;
create table want as
select distinct a.id, a.date as date1,ifn(a.date >=b.date,b.date,.) as date2 format =date9.,
case when calculated date2=. then 'N' else 'Y' end as valid
from have1 a, have2 b
where a.id=b.id
group by a.id,a.date
having date2=max(date2);
quit;
data have1;
input ID No Date :date9.;
format date date9.;
cards;
1 1 4-Apr-18
1 2 19-May-18
1 3 20-Jul-18
1 4 01-Aug-18
;
data have2;
input ID No Date :date9.;
format date date9.;
cards;
1 1 2-May-18
1 2 10-Jun-18
1 3 15-Jul-18
;
data want;
if _n_=1 then do;
if 0 then set have2;
declare hash H (dataset:'have2(rename=(date=date1))',multidata:'y') ;
h.definekey ("id") ;
h.definedata ("date1") ;
h.definedone () ;
call missing(date1);
end;
do until(last.id);
set have1;
by id;
call missing(date2);
do rc=h.find() by 0 while(rc=0);
if date>=date1 then date2=max(date1,date2);
rc=h.find_next();
end;
valid=ifc(date2=.,'N','Y');
output;
end;
format date: date9.;
drop date1 rc;
run;
Thanks for the quick help. It works 🙂
I have started reading about the hash object to understand how it actually works.
One quick question, if the table2 has additional columns and I want the same in output as well
data have2;.
input ID No Acc Date :date9.;
format date date9.;
cards;
1 1 123 2-May-18
1 2 456 10-Jun-18
1 3 789 15-Jul-18
;
I modified h.definedata (all='Y') ;
I am getting 789 as ACC for all cases. After getting date2 , I need to get the ACC associated with that date
ID | No | ACC | Date | date2 | valid |
1 | 1 | 789 | 4-Apr-18 | . | N |
1 | 2 | 789 | 19-May-18 | 2-May-18 | Y |
1 | 3 | 789 | 20-Jul-18 | 15-Jul-18 | Y |
1 | 4 | 789 | 1-Aug-18 | 15-Jul-18 | Y |
instead of
ID | No | ACC | Date | date2 | valid |
1 | 1 | 4-Apr-18 | . | N | |
1 | 2 | 123 | 19-May-18 | 2-May-18 | Y |
1 | 3 | 789 | 20-Jul-18 | 15-Jul-18 | Y |
1 | 4 | 789 | 1-Aug-18 | 15-Jul-18 | Y |
Thanks.
That's simple. Can you plz post your complete modified sample or that's the best representative. Just making sure coz I can modify all in one shot
@smiles Good morning,Welcome to SAS forum and my apologies for the delay. Obviously, I am sure you understand the time difference impact between locations where we login from. Messaging you @ chicago time 9:10am with train/bus delays having just made it to my lab and this is my second post for the day. Earlier, I basically acknowledged your message while waking up.
Here you go:
data have1;
input ID No Date :date9.;
format date date9.;
cards;
1 1 4-Apr-18
1 2 19-May-18
1 3 20-Jul-18
1 4 01-Aug-18
;
data have2;
input ID No Acc Date :date9.;
format date date9.;
cards;
1 1 123 2-May-18
1 2 456 10-Jun-18
1 3 789 15-Jul-18
;
data want;
if _n_=1 then do;
if 0 then set have2;
declare hash H (dataset:'have2(drop=no rename=(date=date1))',multidata:'y',ordered:'y') ;
h.definekey ("id") ;
h.defineData(all: 'y');;
h.definedone () ;
call missing(date1);
end;
do until(last.id);
set have1;
by id;
call missing(date2);
do rc=h.find() by 0 while(rc=0);
if date>=date1 then date2=max(date1,date2);
rc=h.find_next();
end;
if date2 eq . then call missing(acc);
else
do rc=h.find() by 0 while(rc=0);
if date2=date1 then leave;
rc=h.find_next();
end;
valid=ifc(date2=.,'N','Y');
output;
end;
format date: date9.;
drop date1 rc;
run;
Some NOTES:
1. if date2 eq . then call missing(acc); This assumes you have one additional var, however if you have a very wide have2 and you wanna use all:y in definedata you may not know to group all that needs to be part of the call missing(var1,var2--varn).
Call missing(a--z) can work if you know sequence of your definedata
2. No. variable overwrites have1 No. variable during the lookup, therefore I dropped that out of the hash table to maintain sequence order of have1 as is. Of course, you can reassign sequence, learning that No in have2 will overwrite, however there's no point
3. The more clear and best representative your sample is, the better to think through the robust solutions
Feel free to have any follow up questions, if you may.
Or
data want;
if _n_=1 then do;
if 0 then set have2;
declare hash H (dataset:'have2(rename=(date=date1))',multidata:'y') ;
h.definekey ("id") ;
h.definedata ("date1") ;
h.definedone () ;
call missing(date1);
declare hash H2 (dataset:'have2(drop=no rename=(date=date2))',multidata:'y') ;
h2.definekey ("id",'date2') ;
h2.defineData(all: 'y');
h2.definedone () ;
call missing(date2);
end;
do until(last.id);
set have1;
by id;
call missing(date2,acc);
do rc=h.find() by 0 while(rc=0);
if date>=date1 then date2=max(date1,date2);
rc=h.find_next();
end;
rc=h2.find(key:id,key:date2);
valid=ifc(date2=.,'N','Y');
output;
end;
format date: date9.;
drop date1 rc;
run;
This assumes, you have sufficient memory to have two hash tables
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.