Hi,
I am trying to merge two data sets. One of the datasets looks like:
ID testdate1
1 09/30/2015
1 06/15/2016
1 01/05/2017
2 03/20/2017
2 04/22/2017
The number of records for each person varies.
The 2nd data set looks like:
ID testdate2
1 07/06/2016
1 01/02/2017
2 05/19/2017
2 10/20/2017
I can only match them using ID. What I want is:
ID testdate1 testdate2 Indicator
1 06/15/2016 07/06/2016 1
1 01/05/2017 01/02/2017 1
If the interval between testdate1 and testdate2 is greater than 6 months, an indicator variable will be coded 1, otherwise the indicator variable is coded 0.
When I used the ID to merge the two datasets, it turns out to be
ID testdate1 testdate2 Indicator
1 09/30/2015 07/06/2016 0
1 06/15/2016 01/02/2017 0
1 01/05/2017 01/02/2017 1
In the merged dataset above, the first person is mismatched. I want testdate1=06/15/2016 to be matched with 07/06/2016 and testdate1=09/30/2015 should not be matched with any records in data set 2.
Thanks!
Show us your code and log. Let's see if we can modify it. If nothing works out, we shall work on a new solution
My code is simple, but I don't think it is correct.
proc sort data=one;
by ID descending testdate1;
run;
proc sort data=two;
by ID descending testdate2;
run;
data three;
merge one two;
by ID;
run;
I created another variable in data set 1 and 2 by counting the number of observations for each person and then merge the two datasets using ID and the count. I am not sure whether this is correct.
data one1;
set one;
by ID descending testdate1;
if first.ID then count=0;
count+1;
run;
data two1;
set two;
by ID descending testdate2;
if first.ID then count=0;
count+1;
run;
data all;
merge one1 two1;
by ID count;
run;
seems like sql is better for this this:
data have1;
input iD testdate1 :mmddyy10.;
format testdate1 mmddyy10.;
datalines;
1 09/30/2015
1 06/15/2016
1 01/05/2017
2 03/20/2017
2 04/22/2017
;
data have2;
input iD testdate1 :mmddyy10.;
format testdate1 mmddyy10.;
datalines;
1 07/06/2016
1 01/02/2017
2 05/19/2017
2 10/20/2017
;
Try and let me know
proc sql;
create table want as
select *,intck('month',testdate1,testdate2)>6 as indicator
from have1 a, have2(rename=(testdate1=testdate2)) b
where a.id=b.id;
quit;
My description of the problem is not clear. Let me try again. What I really want is an indicator variable in the merged data set with a code of 1 indicating the time interval between testdate2 and testdate1 is less than 6 months and a code of 0 if the time interval between testdate2 and testdate1 is more than 6 months.
changing do lt (less than<
proc sql;
create table want as
select *,intck('month',testdate1,testdate2)<6 as indicator
from have1 a, have2(rename=(testdate1=testdate2)) b
where a.id=b.id;
quit;
Thanks for your help! It doesn't work. Testdate1 and testdate2 are not in the same data set.
Thanks for your input! Problem is solved.
code copied from @novinosrin , add a case statement.
proc sql;
create table want as
select *, case when intck('month',a.testdate1,b.testdate2)<6 then 1 else 0 end as indicatior
from have1 a, have2 b
where a.id=b.id;
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.