Hi,
I have a list of accounts and their pre-accounts with the opening date of the old account.
One account can have more than one pre-account and and one pre-account can also be split into more than one new account.
Here is an example:
data have;
input Account Account_old Open_old;
cards;
40 38 2008
40 39 2009
41 40 2010
42 41 2011
43 42 2012
44 43 2013
45 43 2013
;
run;
So account 38 and 39 are combined together to form account 40. Account 40 is then transformed to account 41 and so on.
Finally account 43 is split to account 44 and 45.
The number of transformations is unknow.
In the end I want a list of all the active accounts (=accounts that are no pre-account to others) and their oldest pre-account.
So this:
data want;
input Account_Active Account_oldest Open_since;
cards;
44 38 2008
45 38 2008
;
run;
I can not think of a way without knowing how many tranformations must be done.
I hope you can... 🙂
@Emjay I will try to address all your questions 🙂
"And how come the ouput is only the desired Account 44 and 45? Which never enter the if or the do?"
This is ensured by (4). Think of it like this. If the current value of Account exists anywhere as an Account_old. Then we're not interested because then the Account is a pre-account to another account.
How will this become a loop, as there is only one Account_old to the just read Account?
It becomes a loop that can iterate more than once because after the lookup, I set Account_Old = Account_Oldest and use that as the lookup value. We continue that process until we reach the end of the "chain". Meaning that the lookup is unsuccesful.
Try adding a Put Statement to the loop like this and check the log. That will give you a nice overview of the iterations.
do until (rc);
rc = hh.find(key : Account_old);
put _all_;
Account_old = Account_oldest;
end;
The first time the rc is a sucess is when Account is 41 and Account_old is 40. And there Account_old is set to 2008.
Strictly speaking: No. We do not even go past step (4) for Account = 41. Because 41 does exist as a pre-account to another account.
In the end this will lead to the desired Account 44 and Open_old 2008, but only because Account_old 38 shows up first in the list and has the oldest date. If Account_old 39 would start in 2007, the final result would still be Account 44 and Open_old 2008, right?
Correct. Good catch 🙂 But only because by default, the hash object allows only 1 item per key value. This can be altered, but for simplicity, this is enough.
Also, please accept my code as the accepted solution if it solves your problem. And do not hesitate to ask further questions.
Kind Regards.
How about this
data have;
input Account Account_old Open_old;
cards;
40 38 2008
40 39 2009
41 40 2010
42 41 2011
43 42 2012
44 43 2013
45 43 2013
;
data want(keep = Account Account_oldest Open_old);
if _N_ = 1 then do;
dcl hash h(dataset : "have");
h.definekey("Account_old");
h.definedone();
dcl hash hh(dataset : "have(rename = Account_old = Account_oldest");
hh.definekey("Account");
hh.definedata("Account_oldest", "Open_old");
hh.definedone();
end;
set have;
if h.check(key : Account);
do until (rc);
rc = hh.find(key : Account_old);
Account_old = Account_oldest;
end;
run;
Result:
Account Open_old Account_oldest 44 2008 38 45 2008 38
Wow, what the hell is THIS??
I've never seen anything like that before...
Could you please enlightening me a little? Looks like I have a lot of reading coming up... 😉
@Emjay I like your enthusiasm 🙂 This is a hash object.
Shortly put, a hash object is a dynamic in-memory data structure available in the SAS data step. The hash object contains key and data values. Therefore, it is often used as an effective lookup tool. However, due to its dynamic nature, it has many other purposes. This is a good example of one of them.
Speaking of reading. If you want to learn more, read Data Management Solutions Using SAS Hash Table Operations by @hashman and @DonH.
Regarding the code, I have added points and explanations below.
data want(keep = Account Account_oldest Open_old);
if _N_ = 1 then do;
dcl hash h(dataset : "have"); /* 1 */
h.definekey("Account_old");
h.definedone();
dcl hash hh(dataset : "have(rename = Account_old = Account_oldest"); /* 2 */
hh.definekey("Account");
hh.definedata("Account_oldest", "Open_old");
hh.definedone();
end;
set have; /* 3 */
if h.check(key : Account); /* 4 */
do until (rc); /* 5 */
rc = hh.find(key : Account_old); /* 6 */
Account_old = Account_oldest; /* 7 */
end;
run;
Thanks for all the explantions. But I'm not sure if I understand it correctly:
So if the data is read sequentially it means in the first line:
/* 4 */
if h.check(key : Account); /* if current account 40 is in the account_old_key of h -> yes */
/* 5 */
do until (rc); /* If rc is a success it's 0. So I guess in this case this means 'Do until rc = 0' */
rc = hh.find(key : Account_old); /* Is Account_old(=38) in the Account_key -> no */
Account_old = Account_oldest; /* Account_old = 2008 */
end;
How will this become a loop, as there is only one Account_old to the just read Account?
Or will it loop over all possible Account_olds? I'm a bit confused by "data is read sequnetially".
The first time the rc is a sucess is when Account is 41 and Account_old is 40. And there Account_old is set to 2008.
In the end this will lead to the desired Account 44 and Open_old 2008, but only because Account_old 38 shows up first in the list and has the oldest date. If Account_old 39 would start in 2007, the final result would still be Account 44 and Open_old 2008, right?
And how come the ouput is only the desired Account 44 and 45? Which never enter the if or the do?
Puh, this hash-thing is quite confusing for me...
@Emjay I will try to address all your questions 🙂
"And how come the ouput is only the desired Account 44 and 45? Which never enter the if or the do?"
This is ensured by (4). Think of it like this. If the current value of Account exists anywhere as an Account_old. Then we're not interested because then the Account is a pre-account to another account.
How will this become a loop, as there is only one Account_old to the just read Account?
It becomes a loop that can iterate more than once because after the lookup, I set Account_Old = Account_Oldest and use that as the lookup value. We continue that process until we reach the end of the "chain". Meaning that the lookup is unsuccesful.
Try adding a Put Statement to the loop like this and check the log. That will give you a nice overview of the iterations.
do until (rc);
rc = hh.find(key : Account_old);
put _all_;
Account_old = Account_oldest;
end;
The first time the rc is a sucess is when Account is 41 and Account_old is 40. And there Account_old is set to 2008.
Strictly speaking: No. We do not even go past step (4) for Account = 41. Because 41 does exist as a pre-account to another account.
In the end this will lead to the desired Account 44 and Open_old 2008, but only because Account_old 38 shows up first in the list and has the oldest date. If Account_old 39 would start in 2007, the final result would still be Account 44 and Open_old 2008, right?
Correct. Good catch 🙂 But only because by default, the hash object allows only 1 item per key value. This can be altered, but for simplicity, this is enough.
Also, please accept my code as the accepted solution if it solves your problem. And do not hesitate to ask further questions.
Kind Regards.
Sorry for my late reply, but this day is just a mess...
Thanks a lot for your patience with me. 🙂
I've added the put _all_ which leaves me even more confused.
The sas-documentation that you added states for KEY: keyvalue:
specifies the key value whose type must match the corresponding key variable that is specified in a DEFINEKEY method call.
Which left me thinking that
if h.check(key : Account);
means:
if have.Account in any h.Account_old
In Account_ols are all the numbers from 38-43 and now I see that the first output is Account 44 so the if must mean exactly the opposite.
This leaves me with the assumption that
hh.find(key : Account_old);
also doesn't translate into
look for have.Account_old in all the hh.Accounts
This combined with the fact, that if I change my original data to
data have;
input Account Account_old Open_old;
cards;
40 38 2008
40 39 2007
41 40 2010
42 41 2011
43 42 2012
44 43 2013
45 43 2013
;
run;
so that the oldest is not in Account_old 38 but 39 a little shattering.
I'm terribly sorry to ask but can you think of a non-hash way? I need to catch a deadline and I'm afraid till then I won't wrap my head around hash-objects...
First, thanks @PeterClemmensen for the reference to the book.
It it not clear what you are asking @Emjay. since you reference two code snippets without a specific question.
I suspect your issue is that you don't recognize that the hash object is actually a data table that you can create and access inside a data step.
So what we have here is three distinct tables of data being accessed inside the data step.
The set statement is looping thru each observation in the have data set - on observation per execution of the DATA step.
It then subsets the data (step 4 in the code) to only continue processing if the account number is not found in the hash table h. The check method returns 0 if it is found and returns a non-zero value if not. So this is subsetting the data to just those that were never a pre-account.
Next the do until construct is simply used to find the first row in the hash table hh that has the the value of account on the current row equal to the account_old value in HH. Since your data are sorted so the oldest pre-account is listed first, as soon as the find method finds it, the value is copied and the loop ends.
Hi @DonH
thanks for the explanations.
I understood that the hash objects are tables, my main issue was the if-statement.
The key was this sentence:
The check method returns 0 if it is found and returns a non-zero value if not. So this is subsetting the data to just those that were never a pre-account.
🙂
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.