Solved: How to compare dates from two datasets having same ID but multiple obs...

smiles · Posted 08-26-2018 11:52 AM

Dear SAS Experts,

Need your help in figuring out how to compare dates in 2 datasets having same ID for multiple observations and selecting the latest as I am bit new to SAS and still learning

Table 1 has 3 columns

ID No Date

1 1 4-Apr-18

1 2 19-May-18

1 3 20-Jul-18

1 4 01-Aug-18

'

Table 2 has 3 columns

ID No Date

1 1 2-May-18

1 2 10-Jun-18

1 3 15-Jul-18

For every record in table1 with the same ID , i need to compare the date with table2 having same ID and ensure t1.date >=t2.date.

If there multiple records satisfying the condition from table 2 then take the latest and create a column valid with values Y or N, if condition not satisfied

e.g. for record 3 in table1 for ID=1 all the records from table2 satisfy the t1.date >=t2.date. in this case i have to pick the 15-jul-18 against that record. and put valid flag as Y

Thanks for all the help in advance.

novinosrin · Posted 08-27-2018 11:14 AM

Or

data want;
if _n_=1 then do;
if 0 then set have2;
  declare hash H (dataset:'have2(rename=(date=date1))',multidata:'y') ;
   h.definekey  ("id") ;
   h.definedata ("date1") ;
   h.definedone () ;
   call missing(date1);
   declare hash H2 (dataset:'have2(drop=no rename=(date=date2))',multidata:'y') ;
   h2.definekey  ("id",'date2') ;
   h2.defineData(all: 'y');
   h2.definedone () ;
   call missing(date2);
end;
do until(last.id);
set have1;
by id;
call missing(date2,acc);
do rc=h.find() by 0 while(rc=0);
if date>=date1 then date2=max(date1,date2);
rc=h.find_next();
end;
rc=h2.find(key:id,key:date2);
valid=ifc(date2=.,'N','Y');
output;
end;
format date: date9.;
drop date1 rc;
run;

This assumes, you have sufficient memory to have two hash tables

View solution in original post

novinosrin · Posted 08-26-2018 12:13 PM

Can you post a sample of your requird output for your input sample to avoid assumptions plz

smiles · Posted 08-26-2018 12:21 PM

The desired output

Table 1 with additional columns

ID No Date t2.date valid

1 1 4-Apr-18 - N

1 2 19-May-18 2-May Y

1 3 20-Jul-18 15-Jul Y

1 4 01-Aug-18 15-Jul Y

Thanks

novinosrin · Posted 08-26-2018 01:27 PM



 data have1;
 input ID   No    Date :date9.;
 format date date9.;
cards;
1     1      4-Apr-18
1     2     19-May-18
1     3      20-Jul-18
1     4       01-Aug-18
;

data have2;
input ID   No    Date :date9.;
 format date date9.;
 cards;
1     1      2-May-18
1     2     10-Jun-18
1     3      15-Jul-18
;

proc sql;
create table want as
select distinct a.id, a.date as date1,ifn(a.date >=b.date,b.date,.) as date2	format =date9.,
case when calculated date2=. then 'N' else 'Y' end as valid
from have1 a, have2 b
where a.id=b.id
group by a.id,a.date 
having date2=max(date2);

quit;

novinosrin · Posted 08-26-2018 05:40 PM

 data have1;
 input ID   No    Date :date9.;
 format date date9.;
cards;
1     1      4-Apr-18
1     2     19-May-18
1     3      20-Jul-18
1     4       01-Aug-18
;

data have2;
input ID   No    Date :date9.;
 format date date9.;
 cards;
1     1      2-May-18
1     2     10-Jun-18
1     3      15-Jul-18
;
data want;
if _n_=1 then do;
if 0 then set have2;
  declare hash H (dataset:'have2(rename=(date=date1))',multidata:'y') ;
   h.definekey  ("id") ;
   h.definedata ("date1") ;
   h.definedone () ;
   call missing(date1);
end;
do until(last.id);
set have1;
by id;
call missing(date2);
do rc=h.find() by 0 while(rc=0);
if date>=date1 then date2=max(date1,date2);
rc=h.find_next();
end;
valid=ifc(date2=.,'N','Y');
output;
end;
format date: date9.;
drop date1 rc;
run;

smiles · Posted 08-27-2018 03:43 AM

Thanks for the quick help. It works 🙂

I have started reading about the hash object to understand how it actually works.

One quick question, if the table2 has additional columns and I want the same in output as well

data have2;.
input ID No Acc Date :date9.;
format date date9.;
cards;
1 1 123 2-May-18
1 2 456 10-Jun-18
1 3 789 15-Jul-18
;

I modified h.definedata (all='Y') ;

I am getting 789 as ACC for all cases. After getting date2 , I need to get the ACC associated with that date

ID	No	ACC	Date	date2	valid
1	1	789	4-Apr-18	.	N
1	2	789	19-May-18	2-May-18	Y
1	3	789	20-Jul-18	15-Jul-18	Y
1	4	789	1-Aug-18	15-Jul-18	Y

instead of

ID	No	ACC	Date	date2	valid
1	1		4-Apr-18	.	N
1	2	123	19-May-18	2-May-18	Y
1	3	789	20-Jul-18	15-Jul-18	Y
1	4	789	1-Aug-18	15-Jul-18	Y

Thanks.

novinosrin · Posted 08-27-2018 07:15 AM

That's simple. Can you plz post your complete modified sample or that's the best representative. Just making sure coz I can modify all in one shot

smiles · Posted 08-27-2018 07:20 AM

That's it. This is the case.

Really appreciate your help. I am beginner so getting to learn . Thanks

novinosrin · Posted 08-27-2018 10:49 AM

@smiles Good morning,Welcome to SAS forum and my apologies for the delay. Obviously, I am sure you understand the time difference impact between locations where we login from. Messaging you @ chicago time 9:10am with train/bus delays having just made it to my lab and this is my second post for the day. Earlier, I basically acknowledged your message while waking up.

Here you go:

data have1;
 input ID   No    Date :date9.;
 format date date9.;
cards;
1     1      4-Apr-18
1     2     19-May-18
1     3      20-Jul-18
1     4       01-Aug-18
;


data have2;
input ID No Acc Date :date9.;
format date date9.;
cards;
1 1 123 2-May-18
1 2 456 10-Jun-18
1 3 789 15-Jul-18
;

data want;
if _n_=1 then do;
if 0 then set have2;
  declare hash H (dataset:'have2(drop=no rename=(date=date1))',multidata:'y',ordered:'y') ;
   h.definekey  ("id") ;
  h.defineData(all: 'y');;
   h.definedone () ;
   call missing(date1);
end;

do until(last.id);
set have1;
by id;
call missing(date2);
do rc=h.find() by 0 while(rc=0);
if date>=date1 then date2=max(date1,date2);
rc=h.find_next();
end;
if date2 eq . then call missing(acc);
else
do rc=h.find() by 0 while(rc=0);
if date2=date1 then leave;
rc=h.find_next();
end;
valid=ifc(date2=.,'N','Y');
output;
end;
format date: date9.;
drop date1 rc;
run;

Some NOTES:

1. if date2 eq . then call missing(acc); This assumes you have one additional var, however if you have a very wide have2 and you wanna use all:y in definedata you may not know to group all that needs to be part of the call missing(var1,var2--varn).

Call missing(a--z) can work if you know sequence of your definedata

2. No. variable overwrites have1 No. variable during the lookup, therefore I dropped that out of the hash table to maintain sequence order of have1 as is. Of course, you can reassign sequence, learning that No in have2 will overwrite, however there's no point

3. The more clear and best representative your sample is, the better to think through the robust solutions

Feel free to have any follow up questions, if you may.

novinosrin · Posted 08-27-2018 11:14 AM

Or

data want;
if _n_=1 then do;
if 0 then set have2;
  declare hash H (dataset:'have2(rename=(date=date1))',multidata:'y') ;
   h.definekey  ("id") ;
   h.definedata ("date1") ;
   h.definedone () ;
   call missing(date1);
   declare hash H2 (dataset:'have2(drop=no rename=(date=date2))',multidata:'y') ;
   h2.definekey  ("id",'date2') ;
   h2.defineData(all: 'y');
   h2.definedone () ;
   call missing(date2);
end;
do until(last.id);
set have1;
by id;
call missing(date2,acc);
do rc=h.find() by 0 while(rc=0);
if date>=date1 then date2=max(date1,date2);
rc=h.find_next();
end;
rc=h2.find(key:id,key:date2);
valid=ifc(date2=.,'N','Y');
output;
end;
format date: date9.;
drop date1 rc;
run;

This assumes, you have sufficient memory to have two hash tables

smiles · Posted 08-28-2018 11:11 AM

Thanks a lot!! Your fixed the problem in such a jiffy which I have been struggling with

How to compare dates from two datasets having same ID but multiple observations

Re: How to compare dates from two datasets having same ID but multiple observations

Re: How to compare dates from two datasets having same ID but multiple observations

Re: How to compare dates from two datasets having same ID but multiple observations

Re: How to compare dates from two datasets having same ID but multiple observations

Re: How to compare dates from two datasets having same ID but multiple observations

Re: How to compare dates from two datasets having same ID but multiple observations

Re: How to compare dates from two datasets having same ID but multiple observations

Re: How to compare dates from two datasets having same ID but multiple observations

Re: How to compare dates from two datasets having same ID but multiple observations

Re: How to compare dates from two datasets having same ID but multiple observations

Re: How to compare dates from two datasets having same ID but multiple observations

SAS Innovate 2025: Call for Content

Classroom Training Available!