Hello
I have a data set that show the ownership of bank accounts.
There are 2 columns:
Account- Key number of the bank account
ID - Identification number of the owner of the bank account
My task is to classify each bank account into one of the following groups:
1- Bank account has only one ownership and the ID doesn't appear in other bank accounts
2- Bank account has only one ownership and the ID appear in other bank accounts
3-Bank account has more than one ID ownership and all ID's don't appear in other bank accounts.
4-Bank account has more than one ID ownership and at least one ID appear in other bank accounts.
May anyone show a code to do it please?
Data rawtbl;
input account ID $ ;
cards;
111 X
111 Y
222 X
333 Y
444 Y
444 R
789 R
555 Q
666 V
666 T
777 A
888 A
999 A
999 B
123 B
;
Run;
2-Bank account has more than one ownership and the ID's doesn't appear in other bank accounts.
PLEASE do us a big, big favor and name your example datasets "have", like we all do; it is really painful to realize that you name it "rawtbl" when the code crashes on the first test run.
First, create an intermediate table that contains all ID's and a flag for multiple usage:
proc sql;
create table id as
select
id,
case
when count(*) gt 1
then 1
else 0
end as id_mult
from have
group by id
;
quit;
Next, sort the have dataset by account:
proc sort data=have;
by account;
run;
so we can use BY in the final data step:
data want;
set have;
by account;
if _n_ = 1
then do;
declare hash idm (dataset:"id");
idm.definekey("id");
idm.definedata("id_mult");
idm.definedone();
call missing(id_mult);
end;
retain mult_id;
if first.account then mult_id = 0;
rc = idm.find();
mult_id = max(mult_id,id_mult);
if last.account;
mult_acct = not first.account;
keep account mult_acct mult_id;
run;
The intermediate table is loaded into a hash object for lookup.
Hello,
PROC FREQ gives all the info you are looking for.
Can you go on from there?
Data rawtbl;
input account ID $ ;
cards;
111 X
111 Y
222 X
333 Y
444 Y
444 R
789 R
555 Q
666 V
666 T
777 A
888 A
999 A
999 B
123 B
;
run;
/*
1- Bank account has only one ownership and the ID doesn't appear in other bank accounts
2- Bank account has only one ownership and the ID appear in other bank accounts
3- Bank account has more than one ID ownership and all ID's don't appear in other bank accounts.
4- Bank account has more than one ID ownership and at least one ID appear in other bank accounts.
*/
ods trace off;
ods output CrossTabFreqs=work.CrossTabFreqs;
PROC FREQ data=rawtbl noprint;
tables account * ID / missing out=count_account_ID;
run;
/* end of program */
Good luck,
Koen
@sbxkoenk I feel you need to explain your line of thought a bit more. The OP basically asks for 4 flag variables. What you propose is the result of a proc freq that has the same number of rows than the source table. Not sure how that helps.
Hello @Patrick ,
You should build upon the ODS output dataset.
ods output CrossTabFreqs=work.CrossTabFreqs;
If not solved this evening (after the work day) I will complete my program.
I haven't thought it through for the full 100%, so I maybe wrong that all 4 flag-variables can be derived from CrossTabFreqs.
Cheers,
Koen
Not sure what do you expect as result exactly, so i created a variable group having the values 1 to 4 to identify the groups you have defined.
And i assumed, that the data is grouped by account.
/* Get number of accounts per ID */
proc summary data= work.rawtbl nway;
class id;
output out= work.Ids(drop= _type_ rename= (_freq_ = count));
run;
/* Using work.Ids as hash-object */
data want;
if 0 then set work.Ids;
set work.rawtbl;
by account notsorted;
length group 8 multiOwner multiAccount 8;
retain multiAccount;
if _n_ = 1 then do;
declare hash owner(dataset: 'work.Ids');
owner.defineKey('id');
owner.defineData('count');
owner.defineDone();
end;
if first.account then do;
multiAccount = 1;
end;
multiOwner = not (first.account and last.account);
rc = owner.find();
multiAccount = multiAccount and (count > 1);
if last.account then do;
group = input(cats(multiOwner, multiAccount), binary2.) + 1;
output;
end;
drop count rc id;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.