Hi, I have been grappling with this problem for a while now and can't figure out a way to do this. I am using SAS EG7.1 and my data structure is Data Vault 2.0 as you will see from the code provided Background In the domain of banking, a home loan is linked to a security i.e. a House. The security can have many accounts and similarly an account may be connected to many securities. In calculating an LVR (loan-to-value ratio) it is important to be able to identify all accounts and all securities that make up a relationship. Data Example TABLE1 Account_id Security_id Link_id 111 xxx a1 222 xxx a2 333 xxx a3 444 xxx a4 444 yyy a5 555 yyy a6 666 yyy a7 The table represents one relationship which needs to be identified as a group ID. This operation needs to be completed on a entire database scale so sequential ID's cannot be assumed. Desired Output TABLE1 Account_id Group_id 111 ZZZ 222 ZZZ 333 ZZZ 444 ZZZ 555 ZZZ 666 ZZZ TABLE2 Security_id Group_id xxx ZZZ yyy ZZZ What I have data links (keep=hub_collateral_key hub_account_key);
merge target.lnk_coll_account (in=a )
target.lsat_coll_account (in=b where=("&long_dt"dt between effective_from_dtm and effective_to_dtm ));
by lnk_coll_account_key;
if a and b then output;
run;
data linkz (keep=hub_collateral_key hub_account_key);
set target.lnk_collateral_account;
run;
proc sort data=links;
by hub_account_key;
run;
data Accounts (keep=group_id hub_account_key) all;
set links;
by hub_collateral_key;
retain group_id 100;
if first.hub_collateral_key then
group_id +1;
output Accounts;
output all;
run;
proc sort data=accounts;
by hub_account_key group_id;
run;
data account_dedupe;
set accounts;
by hub_account_key group_id;
if first.hub_account_key then
output;
run;
proc sql;
create table join_one as
select
a.group_id,
b.hub_collateral_key,
a.hub_account_key
from account_dedupe a
inner join all b on (a.hub_account_key = b.hub_account_key);
quit;
data collaterals (keep=group_id hub_collateral_key) all_2;
set join_one;
output collaterals;
output all_2;
run;
proc sort data=collaterals;
by hub_collateral_key group_id;
run;
data collateral_dedupe;
set collaterals;
by hub_collateral_key group_id;
if first.hub_collateral_key then
output;
run;
proc sql;
create table join_two as
select
a.group_id,
b.hub_account_key,
a.hub_collateral_key
from collateral_dedupe a
inner join all_2 b on (a.hub_collateral_key = b.hub_collateral_key);
quit;
data collateral_groups (keep=group_id hub_collateral_key) account_groups (keep=group_id hub_account_key);
set join_two;
output collateral_groups;
output account_groups;
run;
proc sort data=collateral_groups nodupkey;
by group_id hub_collateral_key;
run;
proc sort data=account_groups nodupkey;
by group_id hub_account_key;
run; Why this doesn't Work This process doesn't identify the accounts only connected to security yyy as a part of the group that occurs first sequentially which is connected to secuity xxx. Constraints Proc Sql to be used as a preference. My table contains thousands of these relationships and I need to be able to identify them all. Hoping you can help. Thanks in advance!
... View more