I currently have two tables have1 and have2, and I want to determine if the YEAR corresponding to each NAME in have2 is within the time range of have1 [START END]. If marked as 1, otherwise it is 0.
I hope to match through the common CODE of two tables and obtain Want.
Thank you for your help!
proc sql;
create table want as select b.*,
case when mdy(1,1,year) between start and end
or mdy(12,31,year) between start and end then 1 else 0 end as exist
from have1 as a right join have2 as b
on a.code=b.code and a.name=b.name;
quit;
Here is the data:
data have1;
input start: YYMMDD8. end: YYMMDD8. name:$1. code;
format start YYMMDDn8. end YYMMDDn8.;
datalines;
20050712 20060531 A 83265
20100624 20130812 B 79080
20040526 20051109 C 35423
20140401 20160418 D 69105
20170419 20180424 E 53272
;
run;
data have2;
input year name:$1. code;
datalines;
2011 A 83265
2012 A 83265
2007 B 79080
2008 B 79080
2009 B 79080
2010 B 79080
2012 D 69105
2013 D 69105
2014 D 69105
2015 D 69105
2016 D 69105
;
run;
I want to obtain:
data want;
input year name:$1. code exist;
datalines;
2011 A 83265 0
2012 A 83265 0
2007 B 79080 0
2008 B 79080 0
2009 B 79080 0
2010 B 79080 1
2012 D 69105 0
2013 D 69105 0
2014 D 69105 1
2015 D 69105 1
2016 D 69105 1
;
run;
Do you mean that ANY PART of the year in HAVE2 is contained between the dates in HAVE1?
Or do you mean that ALL of the year in HAVE2 is contained between the dates in HAVE1?
Or do you mean something else?
My idea is that for have2, first I need to determine whether there is a corresponding CODE in have1. If so, I will then separately determine whether each year is within the time range in have1.
For example, for the first line in have2, I first determine that CODE 83265 is within have1, and then I verify whether year 2011 is within the time range of 20050712 20060531. Obviously, it is not, so I let exist = 0.
Thanks, but I don't think you have answered my questions clearly.
Do you mean that ANY PART of the year in HAVE2 is contained between the dates in HAVE1?
Or do you mean that ALL of the year in HAVE2 is contained between the dates in HAVE1?
Or do you mean something else?
I understand your question. Any PART is OK, thank you very much!
proc sql;
create table want as select b.*,
case when mdy(1,1,year) between start and end
or mdy(12,31,year) between start and end then 1 else 0 end as exist
from have1 as a right join have2 as b
on a.code=b.code and a.name=b.name;
quit;
Hello @PaigeMiller,
I had a very similar idea, but then realized that it would fail if start and end were dates in the same year, e.g., from April and October, respectively. Therefore, the suggestion I had prepared used the criterion year(start)<=year<=year(end), possibly in a CASE-WHEN expression if the case "code not found in HAVE1" should be distinguished from "code found, but year criterion not met":
proc sql;
create table want as
select a.*, case when missing(b.code) then .
else year(start)<=year<=year(end)
end as exist
from have2 a left join have1 b
on a.code=b.code
order by name, year;
quit;
Thank you for your answer. Although you solved my problem. But in reality, my tables have1 and have2 are both very large. I used your method but the operation failed because there was no memory. I think the problem lies in mdy(1,1, year). Is there a more efficient method? Thanks!
MDY has nothing to do with this issue. The issue is that you don't have enough memory because the tables are so large.
Please give us the information about how many rows and columns are in each data set.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.