BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
buddha_d
Pyrite | Level 9

I have the following sample data. 

 

data have;
input AcctID $4. ACCTNAME $12.;
cards;
1234 checking
1234 saving
1234 HSA-loan
5678 checking
8907 saving
7658 HSA-loan
4356 checking
4356 saving
9874 checking
9874 HSA-loan
5437 creditcard
5437 checking
5461 checking
5461 heloc-loan
2543 saving
2543 HSA-loan

7891 HSA-loan

7111 saving 
;
run;

 

I want to get the account IDs which has these conditions

1. Get all account IDs which has only 'checking' account name (no other account name is associated eg: 5678 checking)

2. If there is 'checking' account name along with 'saving' and 'HSA-loan' then I want to get those account IDs (eg: 1234 checking , 1234 saving & 1234 HSA-loan). 

3. If there is 'checking' account name along with 'saving'  then get those account IDs (eg: 4356 checking & 4356 saving).

4.  If there is 'checking' account name along with 'HSA-loan' then I want to get those account IDs (eg: 9874 checking & 9874 HSA-loan) 

 

I don't want to get any other account IDs especially if they are associated with account name of 'checking' and 'creditcard' etc or any Individual Account IDs which has account name as 'saving' or 'HSA-loan' individually or together.

 

The result should have 

AcctID ACCOUNT

1234 checking
1234 saving
1234 HSA-loan
5678 checking

4356 checking
4356 saving
9874 checking
9874 HSA-loan

 

I used the code 

 

proc sql;
create table want as
select acctID,ACCTNAME
from HAVE
group by acctID,ACCTNAME
having (count(distinct ACCTNAME)=1 and UPCASE(ACCTNAME)='CHECKING')
;
QUIT;

 

This is yielding all the account IDs with checking and other account names are there. Could someone help me to get the result I want. 

 

Thanks in advance. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

As a Boolean expression,

upcase(acctname) ~in ('CHECKING', 'SAVING', 'HSA-LOAN')

 is either 1 (TRUE) or 0 (FALSE) for an observation. Hence, the summary statistic

sum(upcase(acctname) ~in ('CHECKING', 'SAVING', 'HSA-LOAN'))

is the number of observations in an AcctID "BY group" (note the GROUP BY clause) where that Boolean expression is TRUE. We are interested in the case that this number is zero because this means that no observation with an UPCASE(ACCTNAME) value other than 'CHECKING', 'SAVING' or 'HSA-LOAN' exists in the BY group. The possible values of that sum are 0, 1, 2, etc. Therefore, the condition

sum(upcase(acctname) ~in ('CHECKING', 'SAVING', 'HSA-LOAN'))=0

can also be written as

~sum(upcase(acctname) ~in ('CHECKING', 'SAVING', 'HSA-LOAN'))

because 0 -- interpreted as a Boolean value -- is FALSE, whereas all other non-missing numeric values are interpreted as TRUE. Feel free to replace ~sum(...) with sum(...)=0 if it's easier to understand.

 

The ORDER BY clause is not necessary if the (virtually unpredictable) default order of observations produced by PROC SQL is good enough. My idea was to sort dataset WANT by AcctID and within an AcctID "BY group" by Account -- yet not alphabetically, but so that observations with UPCASE(Account)='CHECKING' come first, followed by observations with UPCASE(Account)='SAVING', followed by observations with UPCASE(Account)='HSA-LOAN'. The FINDC function searches the constant string "csH" and if it finds the first character of variable Account (i.e., the result of first(account)), irrespective of case (see modifier 'i'), it returns the position of that character in the string: 1 for a "C", 2 for an "S" and 3 for an "H" -- so that sorting by this position number yields the intended order of the three possible values of  Account.

View solution in original post

4 REPLIES 4
FreelanceReinh
Jade | Level 19

Hello @buddha_d,

 

So the condition is: All values of upcase(acctname) for an AcctId must be from the set {'CHECKING', 'SAVING', 'HSA-LOAN'} and 'CHECKING' must occur. Correct?

 

If so, this should select the observations in question:

proc sql;
create table want as
select AcctID, acctname as ACCOUNT
from have
group by AcctID
having max(upcase(acctname)='CHECKING') & ~sum(upcase(acctname) ~in ('CHECKING', 'SAVING', 'HSA-LOAN'))
order by AcctID, findc('csH',first(account),'i');
quit;
buddha_d
Pyrite | Level 9

I am running the query in DB. Once I pull the records, let me try this code. It works for me with the dummy data provided. Thank you so much, FreelanceReinhard.

buddha_d
Pyrite | Level 9

Could you please explain me what this code doing here

Thanks

 ~sum(upcase(acctname) ~in ('CHECKING', 'SAVING', 'HSA-LOAN'))
order by AcctID, findc('csH',first(account),'i')

 

FreelanceReinh
Jade | Level 19

As a Boolean expression,

upcase(acctname) ~in ('CHECKING', 'SAVING', 'HSA-LOAN')

 is either 1 (TRUE) or 0 (FALSE) for an observation. Hence, the summary statistic

sum(upcase(acctname) ~in ('CHECKING', 'SAVING', 'HSA-LOAN'))

is the number of observations in an AcctID "BY group" (note the GROUP BY clause) where that Boolean expression is TRUE. We are interested in the case that this number is zero because this means that no observation with an UPCASE(ACCTNAME) value other than 'CHECKING', 'SAVING' or 'HSA-LOAN' exists in the BY group. The possible values of that sum are 0, 1, 2, etc. Therefore, the condition

sum(upcase(acctname) ~in ('CHECKING', 'SAVING', 'HSA-LOAN'))=0

can also be written as

~sum(upcase(acctname) ~in ('CHECKING', 'SAVING', 'HSA-LOAN'))

because 0 -- interpreted as a Boolean value -- is FALSE, whereas all other non-missing numeric values are interpreted as TRUE. Feel free to replace ~sum(...) with sum(...)=0 if it's easier to understand.

 

The ORDER BY clause is not necessary if the (virtually unpredictable) default order of observations produced by PROC SQL is good enough. My idea was to sort dataset WANT by AcctID and within an AcctID "BY group" by Account -- yet not alphabetically, but so that observations with UPCASE(Account)='CHECKING' come first, followed by observations with UPCASE(Account)='SAVING', followed by observations with UPCASE(Account)='HSA-LOAN'. The FINDC function searches the constant string "csH" and if it finds the first character of variable Account (i.e., the result of first(account)), irrespective of case (see modifier 'i'), it returns the position of that character in the string: 1 for a "C", 2 for an "S" and 3 for an "H" -- so that sorting by this position number yields the intended order of the three possible values of  Account.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1288 views
  • 1 like
  • 2 in conversation