BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
airchang
Calcite | Level 5

So I have a hash with id and date as its keys. I'd like to use these two keys to find data in "set" dataset. If the data can not be found using id and date, I'd like to see if it can be found using id and date+1. If not then try id and date+2, id and date+3 ... up to date+10. I'm not sure how to achieve this "if then" effect and how to iterate over hash keys. 

 

The dataset is really large, so I was suggested to use Hash rather than proc sql. Any idea?

 

Thanks!!

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

I'm not really sure that I understand what you have. What I've built on is: You have a very large data set and you have a not so large data set for look-up with {id,date}. You need to determine if an {id, date} combination in your very large data set matches with {id,date} in your lookup table - and you want to use a range from date to date+10 for this check.

 

The following code assumes that your large table is stored as a SAS dataset and not as a database table (else please tell us).

data have_rathersmall_lookup;
  do id=1,4,6;
    do date='25sep2016'd,'01oct2016'd;
      output;
    end;
  end;
  format date date9.;
run;

data have_large;
  do id=1 to 7;
    do date='01sep2016'd to '30oct2016'd;
      output;
    end;
  end;
  format date date9.;
run;

data want;
  set have_large;

  if _n_=1 then
    do;
      /* mapping step to ensure variables used in the hash have been defined */
      if 0 then set have_rathersmall_lookup(keep=id date);
      /* define hash */
      dcl hash h1(dataset:'have_rathersmall_lookup(keep=id date)', duplicate:'r');
      h1.defineKey('id','date');
      h1.defineDone();  
    end;

  found_flg=0;
  do date_inc=0 to 9;
    if h1.check(key:id, key:date+date_inc)=0 then 
      do;
        found_flg=1;
        leave;
      end;
  end;
  if date_inc=10 then call missing(date_inc);

run;

added comments to code for clarity based on follow up question

View solution in original post

3 REPLIES 3
Patrick
Opal | Level 21

I'm not really sure that I understand what you have. What I've built on is: You have a very large data set and you have a not so large data set for look-up with {id,date}. You need to determine if an {id, date} combination in your very large data set matches with {id,date} in your lookup table - and you want to use a range from date to date+10 for this check.

 

The following code assumes that your large table is stored as a SAS dataset and not as a database table (else please tell us).

data have_rathersmall_lookup;
  do id=1,4,6;
    do date='25sep2016'd,'01oct2016'd;
      output;
    end;
  end;
  format date date9.;
run;

data have_large;
  do id=1 to 7;
    do date='01sep2016'd to '30oct2016'd;
      output;
    end;
  end;
  format date date9.;
run;

data want;
  set have_large;

  if _n_=1 then
    do;
      /* mapping step to ensure variables used in the hash have been defined */
      if 0 then set have_rathersmall_lookup(keep=id date);
      /* define hash */
      dcl hash h1(dataset:'have_rathersmall_lookup(keep=id date)', duplicate:'r');
      h1.defineKey('id','date');
      h1.defineDone();  
    end;

  found_flg=0;
  do date_inc=0 to 9;
    if h1.check(key:id, key:date+date_inc)=0 then 
      do;
        found_flg=1;
        leave;
      end;
  end;
  if date_inc=10 then call missing(date_inc);

run;

added comments to code for clarity based on follow up question

airchang
Calcite | Level 5

Hi That's very helpful. But what does the line "if 0 then set want_lookup" mean and what is the want_lookup table? 

 

I tried this code, and it outputs a dataset that includes all the data. What I'd prefer is a dataset that include one indicator of the earlist date it is found. For example, if id and date are found in date0 ,there is no need to check whether it can be found in date 1, or 2 etc, and no need to output that info. Any idea?

Thanks!

Patrick
Opal | Level 21

what does the line "if 0 then set want_lookup" mean

 

It should have been "if 0 then set have_rathersmall_lookup(keep=id date);"  That's a mapping step to define variables used by the hash. I've added some comments to the already posted code which hopefully will increase clarity.

 

"What I'd prefer is a dataset that include one indicator of the earlist date it is found."

Best would be if you could post some sample data and then explain the desired result. What would you expect for example if you have records for an ID where the earliest data doesn't match your look-up table? And what would you expect to get for ID's which never have a date in the range?

 

Whatever you need can be done - but it will be quite a bit of coding. To pre-sort your large table would make things much easier. How big is your table really? Is it worth all the extra coding logic.?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1798 views
  • 0 likes
  • 2 in conversation