BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
airchang
Calcite | Level 5

So I have a hash with id and date as its keys. I'd like to use these two keys to find data in "set" dataset. If the data can not be found using id and date, I'd like to see if it can be found using id and date+1. If not then try id and date+2, id and date+3 ... up to date+10. I'm not sure how to achieve this "if then" effect and how to iterate over hash keys. 

 

The dataset is really large, so I was suggested to use Hash rather than proc sql. Any idea?

 

Thanks!!

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

I'm not really sure that I understand what you have. What I've built on is: You have a very large data set and you have a not so large data set for look-up with {id,date}. You need to determine if an {id, date} combination in your very large data set matches with {id,date} in your lookup table - and you want to use a range from date to date+10 for this check.

 

The following code assumes that your large table is stored as a SAS dataset and not as a database table (else please tell us).

data have_rathersmall_lookup;
  do id=1,4,6;
    do date='25sep2016'd,'01oct2016'd;
      output;
    end;
  end;
  format date date9.;
run;

data have_large;
  do id=1 to 7;
    do date='01sep2016'd to '30oct2016'd;
      output;
    end;
  end;
  format date date9.;
run;

data want;
  set have_large;

  if _n_=1 then
    do;
      /* mapping step to ensure variables used in the hash have been defined */
      if 0 then set have_rathersmall_lookup(keep=id date);
      /* define hash */
      dcl hash h1(dataset:'have_rathersmall_lookup(keep=id date)', duplicate:'r');
      h1.defineKey('id','date');
      h1.defineDone();  
    end;

  found_flg=0;
  do date_inc=0 to 9;
    if h1.check(key:id, key:date+date_inc)=0 then 
      do;
        found_flg=1;
        leave;
      end;
  end;
  if date_inc=10 then call missing(date_inc);

run;

added comments to code for clarity based on follow up question

View solution in original post

3 REPLIES 3
Patrick
Opal | Level 21

I'm not really sure that I understand what you have. What I've built on is: You have a very large data set and you have a not so large data set for look-up with {id,date}. You need to determine if an {id, date} combination in your very large data set matches with {id,date} in your lookup table - and you want to use a range from date to date+10 for this check.

 

The following code assumes that your large table is stored as a SAS dataset and not as a database table (else please tell us).

data have_rathersmall_lookup;
  do id=1,4,6;
    do date='25sep2016'd,'01oct2016'd;
      output;
    end;
  end;
  format date date9.;
run;

data have_large;
  do id=1 to 7;
    do date='01sep2016'd to '30oct2016'd;
      output;
    end;
  end;
  format date date9.;
run;

data want;
  set have_large;

  if _n_=1 then
    do;
      /* mapping step to ensure variables used in the hash have been defined */
      if 0 then set have_rathersmall_lookup(keep=id date);
      /* define hash */
      dcl hash h1(dataset:'have_rathersmall_lookup(keep=id date)', duplicate:'r');
      h1.defineKey('id','date');
      h1.defineDone();  
    end;

  found_flg=0;
  do date_inc=0 to 9;
    if h1.check(key:id, key:date+date_inc)=0 then 
      do;
        found_flg=1;
        leave;
      end;
  end;
  if date_inc=10 then call missing(date_inc);

run;

added comments to code for clarity based on follow up question

airchang
Calcite | Level 5

Hi That's very helpful. But what does the line "if 0 then set want_lookup" mean and what is the want_lookup table? 

 

I tried this code, and it outputs a dataset that includes all the data. What I'd prefer is a dataset that include one indicator of the earlist date it is found. For example, if id and date are found in date0 ,there is no need to check whether it can be found in date 1, or 2 etc, and no need to output that info. Any idea?

Thanks!

Patrick
Opal | Level 21

what does the line "if 0 then set want_lookup" mean

 

It should have been "if 0 then set have_rathersmall_lookup(keep=id date);"  That's a mapping step to define variables used by the hash. I've added some comments to the already posted code which hopefully will increase clarity.

 

"What I'd prefer is a dataset that include one indicator of the earlist date it is found."

Best would be if you could post some sample data and then explain the desired result. What would you expect for example if you have records for an ID where the earliest data doesn't match your look-up table? And what would you expect to get for ID's which never have a date in the range?

 

Whatever you need can be done - but it will be quite a bit of coding. To pre-sort your large table would make things much easier. How big is your table really? Is it worth all the extra coding logic.?

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 2246 views
  • 0 likes
  • 2 in conversation