BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Emjay
Obsidian | Level 7

Hi,

I have a list of accounts and their pre-accounts with the opening date of the old account.

One account can have more than one pre-account and and one pre-account can also be split into more than one new account.

 

Here is an example:

data have;
input Account Account_old Open_old;
cards;
40 38 2008
40 39 2009
41 40 2010
42 41 2011
43 42 2012
44 43 2013
45 43 2013
;
run;

So account 38 and 39 are combined together to form account 40. Account 40 is then transformed to account 41 and so on.

Finally account 43 is split to account 44 and 45.

 

The number of transformations is unknow.

 

In the end I want a list of all the active accounts (=accounts that are no pre-account to others) and their oldest pre-account.

 

So this:

data want;
input Account_Active Account_oldest Open_since;
cards;
44 38 2008
45 38 2008
;
run;

I can not think of a way without knowing how many tranformations must be done.

I hope you can... 🙂

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

@Emjay I will try to address all your questions 🙂

 

"And how come the ouput is only the desired Account 44 and 45? Which never enter the if or the do?"

 

This is ensured by (4). Think of it like this. If the current value of Account exists anywhere as an Account_old. Then we're not interested because then the Account is a pre-account to another account. 

 

How will this become a loop, as there is only one Account_old to the just read Account?

 

It becomes a loop that can iterate more than once because after the lookup, I set Account_Old = Account_Oldest and use that as the lookup value. We continue that process until we reach the end of the "chain". Meaning that the lookup is unsuccesful.

 

Try adding a Put Statement to the loop like this and check the log. That will give you a nice overview of the iterations. 

 

   do until (rc);
      rc = hh.find(key : Account_old);
      put _all_;
      Account_old = Account_oldest;
   end;

The first time the rc is a sucess is when Account is 41 and Account_old is 40. And there Account_old is set to 2008.

 

Strictly speaking: No. We do not even go past step (4) for Account = 41. Because 41 does exist as a pre-account to another account.

 

In the end this will lead to the desired Account 44 and Open_old 2008, but only because Account_old 38 shows up first in the list and has the oldest date. If Account_old 39 would start in 2007, the final result would still be Account 44 and Open_old 2008, right?

 

Correct. Good catch 🙂 But only because by default, the hash object allows only 1 item per key value. This can be altered, but for simplicity, this is enough.

 

Also, please accept my code as the accepted solution if it solves your problem. And do not hesitate to ask further questions.

 

Kind Regards. 

View solution in original post

8 REPLIES 8
PeterClemmensen
Tourmaline | Level 20

How about this

 

data have;
input Account Account_old Open_old;
cards;
40 38 2008
40 39 2009
41 40 2010
42 41 2011
43 42 2012
44 43 2013
45 43 2013
;

data want(keep = Account Account_oldest Open_old);

   if _N_ = 1 then do;
      dcl hash h(dataset : "have");
      h.definekey("Account_old");
      h.definedone();

      dcl hash hh(dataset : "have(rename = Account_old = Account_oldest");
      hh.definekey("Account");
      hh.definedata("Account_oldest", "Open_old");
      hh.definedone();
   end;

   set have;

   if h.check(key : Account);

   do until (rc);
      rc = hh.find(key : Account_old);
      Account_old = Account_oldest;
   end;

run;

 

Result:

 

Account Open_old Account_oldest 
44      2008     38 
45      2008     38   
Emjay
Obsidian | Level 7

@PeterClemmensen 

Wow, what the hell is THIS??

I've never seen anything like that before...

 

Could you please enlightening me a little? Looks like I have a lot of reading coming up... 😉

PeterClemmensen
Tourmaline | Level 20

@Emjay I like your enthusiasm 🙂 This is a hash object.

 

Shortly put, a hash object is a dynamic in-memory data structure available in the SAS data step. The hash object contains key and data values. Therefore, it is often used as an effective lookup tool. However, due to its dynamic nature, it has many other purposes. This is a good example of one of them. 

 

Speaking of reading. If you want to learn more, read Data Management Solutions Using SAS Hash Table Operations by @hashman and @DonH

 

Regarding the code, I have added points and explanations below.

 

  1. Declare the hash object h. Read the data set have into h and use Account_old as key.
  2. Declare the hash object hh. Read the data set have into and use Account as key and Account_oldest (renamed) and Open_old as data variables.
  3. Read the have data set sequentially.
  4. Proceed only if the current value of Account exists as an Account_old. Meaning, we want only accounts that are no pre-account to others.
  5. I use a Do Until Loop to perform a series of lookups. For example, for Account 44, we want to follow the path 44-43-42-41-40-38. This is what is ensured in (5), (6) and (7).
  6. Perform the actual lookup using the Find() Method
  7. Set Prepare the next lookup in the series by setting Account_Old = Account_Oldest.

 

 

data want(keep = Account Account_oldest Open_old);

   if _N_ = 1 then do;
      dcl hash h(dataset : "have");                                         /* 1 */
      h.definekey("Account_old");
      h.definedone();

      dcl hash hh(dataset : "have(rename = Account_old = Account_oldest");  /* 2 */
      hh.definekey("Account");
      hh.definedata("Account_oldest", "Open_old");
      hh.definedone();
   end;

   set have;                                                                /* 3 */

   if h.check(key : Account);                                               /* 4 */

   do until (rc);                                                           /* 5 */
      rc = hh.find(key : Account_old);                                      /* 6 */
      Account_old = Account_oldest;                                         /* 7 */
   end;

run;

 

 

Emjay
Obsidian | Level 7

@PeterClemmensen 

 

Thanks for all the explantions. But I'm not sure if I understand it correctly:

 

So if the data is read sequentially it means in  the first line:

/* 4 */

if h.check(key : Account);  /* if current account 40 is in the account_old_key of h -> yes */

 /* 5 */

do until (rc); /* If rc is a success it's 0. So I guess in this case this means 'Do until rc = 0' */
      rc = hh.find(key : Account_old);  /* Is Account_old(=38) in the Account_key -> no */
      Account_old = Account_oldest; /* Account_old = 2008 */
end;

How will this become a loop, as there is only one Account_old to the just read Account?

Or will it loop over all possible Account_olds? I'm a bit confused by "data is read sequnetially".

 

The first time the rc is a sucess is when Account is 41 and Account_old is 40. And there Account_old is set to 2008.

 

In the end this will lead to the desired Account 44 and Open_old 2008, but only because Account_old 38 shows up first in the list and has the oldest date. If Account_old 39 would start in 2007, the final result would still be Account 44 and Open_old 2008, right?

 

And how come the ouput is only the desired Account 44 and 45? Which never enter the if or the do?

 

Puh, this hash-thing is quite confusing for me...

PeterClemmensen
Tourmaline | Level 20

@Emjay I will try to address all your questions 🙂

 

"And how come the ouput is only the desired Account 44 and 45? Which never enter the if or the do?"

 

This is ensured by (4). Think of it like this. If the current value of Account exists anywhere as an Account_old. Then we're not interested because then the Account is a pre-account to another account. 

 

How will this become a loop, as there is only one Account_old to the just read Account?

 

It becomes a loop that can iterate more than once because after the lookup, I set Account_Old = Account_Oldest and use that as the lookup value. We continue that process until we reach the end of the "chain". Meaning that the lookup is unsuccesful.

 

Try adding a Put Statement to the loop like this and check the log. That will give you a nice overview of the iterations. 

 

   do until (rc);
      rc = hh.find(key : Account_old);
      put _all_;
      Account_old = Account_oldest;
   end;

The first time the rc is a sucess is when Account is 41 and Account_old is 40. And there Account_old is set to 2008.

 

Strictly speaking: No. We do not even go past step (4) for Account = 41. Because 41 does exist as a pre-account to another account.

 

In the end this will lead to the desired Account 44 and Open_old 2008, but only because Account_old 38 shows up first in the list and has the oldest date. If Account_old 39 would start in 2007, the final result would still be Account 44 and Open_old 2008, right?

 

Correct. Good catch 🙂 But only because by default, the hash object allows only 1 item per key value. This can be altered, but for simplicity, this is enough.

 

Also, please accept my code as the accepted solution if it solves your problem. And do not hesitate to ask further questions.

 

Kind Regards. 

Emjay
Obsidian | Level 7

@PeterClemmensen 

 

Sorry for my late reply, but this day is just a mess...

Thanks a lot for your patience with me. 🙂

 

I've added the put _all_ which leaves me even more confused.

The sas-documentation that you added states for KEY: keyvalue:

specifies the key value whose type must match the corresponding key variable that is specified in a DEFINEKEY method call.

 

Which left me thinking that

if h.check(key : Account);

 means:

if have.Account in any h.Account_old 

In Account_ols are all the numbers from 38-43 and now I see that the first output is Account 44 so the if must mean exactly the opposite.

This leaves me with the assumption that 

hh.find(key : Account_old);

also doesn't translate into

look for have.Account_old in all the hh.Accounts

This combined with the fact, that if I change my original data to 

data have;
input Account Account_old Open_old;
cards;
40 38 2008
40 39 2007
41 40 2010
42 41 2011
43 42 2012
44 43 2013
45 43 2013
;
run;

 so that the oldest is not in Account_old 38 but 39 a little shattering.

 

I'm terribly sorry to ask but can you think of a non-hash way? I need to catch a deadline and I'm afraid till then I won't wrap my head around hash-objects...

 

DonH
Lapis Lazuli | Level 10

First, thanks @PeterClemmensen for the reference to the book.

It it not clear what you are asking @Emjay. since you reference two code snippets without a specific question.

I suspect your issue is that you don't recognize that the hash object is actually a data table that you can create and access inside a data step. 

 

So what we have here is three distinct tables of data being accessed inside the data step.

 

  1. The hash object h is created from your data set and is all the distinct values of account_old.
  2. The hash object hh is the list of all account by account_old combinations - with account_old renamed to account_oldest.
  3. Your input data have that is read in the SET statement.

The set statement is looping thru each observation in the have data set - on observation per execution of the DATA step.

It then subsets the data (step 4 in the code) to only continue processing if the account number is not found in the hash table h. The check method returns 0 if it is found and returns a non-zero value if not. So this is subsetting the data to just those that were never a pre-account.


Next the do until construct is simply used to find the first row in the hash table hh that has the the value of account on the current row equal to the account_old value in HH. Since your data are sorted so the oldest pre-account is listed first, as soon as the find method finds it, the value is copied and the loop ends.

Emjay
Obsidian | Level 7

Hi @DonH 

thanks for the explanations. 

I understood that the hash objects are tables, my main issue was the if-statement.

The key was this sentence:

 


The check method returns 0 if it is found and returns a non-zero value if not. So this is subsetting the data to just those that were never a pre-account.

🙂

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 870 views
  • 4 likes
  • 3 in conversation