I currently have two tables have1 and have2, and I want to determine if the YEAR corresponding to each NAME in have2 is within the time range of have1 [START END]. If marked as 1, otherwise it is 0. Any part of the YEAR can be within the time range.
I hope to match through the common CODE of two tables and obtain Want.
Thank you for your help!
Depending on the size of have1 using a hash object could solve the problem:
data work.want;
set work.have2;
length flag more 8;
if _n_ = 1 then do;
if 0 then set work.have1;
declare hash h(dataset: 'work.have1', multidata: 'yes');
h.defineKey('name', 'code');
h.defineData('start', 'end');
h.defineDone();
call missing(start, end);
end;
flag = 0;
if h.find() = 0 then do;
flag = year(start) <= year <= year(end);
h.has_next(result: more);
do while (more and not flag);
rc = h.find_next();
flag = (year(start) <= year <= year(end)) or flag;
h.has_next(result: more);
end;
end;
drop start end rc more;
run;
Here is the data:
data have1;
input start: YYMMDD8. end: YYMMDD8. name:$1. code;
format start YYMMDDn8. end YYMMDDn8.;
datalines;
20050712 20060531 A 83265
20070531 20080531 A 83265
20100624 20130812 B 79080
20090620 20110912 B 79080
20040526 20051109 C 35423
20051109 20100831 C 35423
20100831 20110930 C 35423
20140401 20160418 D 69105
20170419 20180424 E 53272
;
run;
data have2;
input year name:$1. code;
datalines;
2011 A 83265
2012 A 83265
2007 B 79080
2008 B 79080
2009 B 79080
2010 B 79080
2012 D 69105
2013 D 69105
2014 D 69105
2015 D 69105
2016 D 69105
;
run;
data want;
input year name:$1. code exist;
datalines;
2011 A 83265 0
2012 A 83265 0
2007 B 79080 0
2008 B 79080 0
2009 B 79080 1
2010 B 79080 1
2012 D 69105 0
2013 D 69105 0
2014 D 69105 1
2015 D 69105 1
2016 D 69105 1
;
run;
You already asked this question, and you marked an answer correct.
Yes. But they are in different situations, especially for have1. Each NAME and CODE corresponds to different time intervals and needs to be concatenated for judgment. The previous code cannot be implemented in this situation.
In the other thread, you mentioned that you have really huge data sets, and that the solution presented didn't work because you ran out of memory. Is that the case here as well?
Yes. HAVE2 has over 60000 rows and 20 columns, while HAVE1 has 1.5 million rows and 10 columns.
Here is a solution that uses indexed lookup - the index on the HAVE1 dataset should probably be created in the program that creates the table, not in the lookup program:
proc sql;
create index idx on have1(name,code);
quit;
data want;
set have2;
exist=0;
do name=name,' '; /* initialize for next read by calling with non-existing key, as keys repeat in HAVE2 */
do until(exist);
set have1 key=idx;
if _iorc_ then do;
_error_=0;
leave;
end;
exist=year(start)<=year<=year(end);
end;
if name ne ' ' then output;
end;
keep year name code exist;
run;
The good thing about this approach, when HAVE2 is the smaller table, is that we only access the records in HAVE1 that are actually relevant.
Here is another possibility, which requires sorting the big HAVE1 table. We can then create a view (or a table, but a view is probably faster in this case) containing the possible years in the correct order (this is of course a very good solution if your actual data, unlike your demonstration data, is actually sorted already):
proc sort data=have1;
by name code start;
run;
data have1_years/view=have1_years;
do until(last.code);
set have1;
by name code;
do year=max(year+1,year(start)) to year(end);
output;
end;
end;
keep name code year;
run;
data want;
merge have1_years(in=in1) have2(in=in2);
by name code year;
if in2;
exist=in1;
run;
The solution can also be done extracting NAME, CODE and YEAR from HAVE1, and then sort the output data afterwards. This may be a better solution if you have many columns in HAVE1:
data have1_years;
set have1;
do year=year(start) to year(end);
output;
end;
keep name code year;
run;
Proc sort nodupkey;
by name code year;
run;
@Alice_SAS wrote:
Yes. But they are in different situations, especially for have1. Each NAME and CODE corresponds to different time intervals and needs to be concatenated for judgment. The previous code cannot be implemented in this situation.
So just where is your statement of how intervals need to be concatenated and what rule(s) are applied for "judgement". I don't see anything about either action in the original problem description.
I currently have two tables have1 and have2, and I want to determine if the YEAR corresponding to each NAME in have2 is within the time range of have1 [START END]. If marked as 1, otherwise it is 0. Any part of the YEAR can be within the time range.
I hope to match through the common CODE of two tables and obtain Want.
Thank you for your help!
Or in the post with the example data.
If a previous solution does not work, you should post a reference to that "solution" and describe what has changed that makes it no longer suitable.
Depending on the size of have1 using a hash object could solve the problem:
data work.want;
set work.have2;
length flag more 8;
if _n_ = 1 then do;
if 0 then set work.have1;
declare hash h(dataset: 'work.have1', multidata: 'yes');
h.defineKey('name', 'code');
h.defineData('start', 'end');
h.defineDone();
call missing(start, end);
end;
flag = 0;
if h.find() = 0 then do;
flag = year(start) <= year <= year(end);
h.has_next(result: more);
do while (more and not flag);
rc = h.find_next();
flag = (year(start) <= year <= year(end)) or flag;
h.has_next(result: more);
end;
end;
drop start end rc more;
run;
What does it mean for a year to be within a time period?
Do you mean that the whole year is covered by the time period?
Period Start -------- End Year J---D
start <= mdy(1,1,year) and mdy(12,31,year) <= end
Or does only part of the year need to be covered?
Period Start -------- End Year J-----D Year J-----D
start <= mdy(12,31,year) and mdy(1,1,year) <= end
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.