So I have a hash with id and date as its keys. I'd like to use these two keys to find data in "set" dataset. If the data can not be found using id and date, I'd like to see if it can be found using id and date+1. If not then try id and date+2, id and date+3 ... up to date+10. I'm not sure how to achieve this "if then" effect and how to iterate over hash keys.
The dataset is really large, so I was suggested to use Hash rather than proc sql. Any idea?
Thanks!!
I'm not really sure that I understand what you have. What I've built on is: You have a very large data set and you have a not so large data set for look-up with {id,date}. You need to determine if an {id, date} combination in your very large data set matches with {id,date} in your lookup table - and you want to use a range from date to date+10 for this check.
The following code assumes that your large table is stored as a SAS dataset and not as a database table (else please tell us).
data have_rathersmall_lookup;
do id=1,4,6;
do date='25sep2016'd,'01oct2016'd;
output;
end;
end;
format date date9.;
run;
data have_large;
do id=1 to 7;
do date='01sep2016'd to '30oct2016'd;
output;
end;
end;
format date date9.;
run;
data want;
set have_large;
if _n_=1 then
do;
/* mapping step to ensure variables used in the hash have been defined */
if 0 then set have_rathersmall_lookup(keep=id date);
/* define hash */
dcl hash h1(dataset:'have_rathersmall_lookup(keep=id date)', duplicate:'r');
h1.defineKey('id','date');
h1.defineDone();
end;
found_flg=0;
do date_inc=0 to 9;
if h1.check(key:id, key:date+date_inc)=0 then
do;
found_flg=1;
leave;
end;
end;
if date_inc=10 then call missing(date_inc);
run;
added comments to code for clarity based on follow up question
I'm not really sure that I understand what you have. What I've built on is: You have a very large data set and you have a not so large data set for look-up with {id,date}. You need to determine if an {id, date} combination in your very large data set matches with {id,date} in your lookup table - and you want to use a range from date to date+10 for this check.
The following code assumes that your large table is stored as a SAS dataset and not as a database table (else please tell us).
data have_rathersmall_lookup;
do id=1,4,6;
do date='25sep2016'd,'01oct2016'd;
output;
end;
end;
format date date9.;
run;
data have_large;
do id=1 to 7;
do date='01sep2016'd to '30oct2016'd;
output;
end;
end;
format date date9.;
run;
data want;
set have_large;
if _n_=1 then
do;
/* mapping step to ensure variables used in the hash have been defined */
if 0 then set have_rathersmall_lookup(keep=id date);
/* define hash */
dcl hash h1(dataset:'have_rathersmall_lookup(keep=id date)', duplicate:'r');
h1.defineKey('id','date');
h1.defineDone();
end;
found_flg=0;
do date_inc=0 to 9;
if h1.check(key:id, key:date+date_inc)=0 then
do;
found_flg=1;
leave;
end;
end;
if date_inc=10 then call missing(date_inc);
run;
added comments to code for clarity based on follow up question
Hi That's very helpful. But what does the line "if 0 then set want_lookup" mean and what is the want_lookup table?
I tried this code, and it outputs a dataset that includes all the data. What I'd prefer is a dataset that include one indicator of the earlist date it is found. For example, if id and date are found in date0 ,there is no need to check whether it can be found in date 1, or 2 etc, and no need to output that info. Any idea?
Thanks!
what does the line "if 0 then set want_lookup" mean
It should have been "if 0 then set have_rathersmall_lookup(keep=id date);" That's a mapping step to define variables used by the hash. I've added some comments to the already posted code which hopefully will increase clarity.
"What I'd prefer is a dataset that include one indicator of the earlist date it is found."
Best would be if you could post some sample data and then explain the desired result. What would you expect for example if you have records for an ID where the earliest data doesn't match your look-up table? And what would you expect to get for ID's which never have a date in the range?
Whatever you need can be done - but it will be quite a bit of coding. To pre-sort your large table would make things much easier. How big is your table really? Is it worth all the extra coding logic.?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.