Fluorite | Level 6

## Calculating proportion of variable by another identifier

Hi - I am looking at bank lending data by bank within different industries. I want to create a table that has the banks with the largest proportion of their loans in one industry. I am looking for the top 20. This is the code I have and I am really stuck on how to actually calculate the industry proportion of loans by bank - the one piece I need.

It all comes from one huge database.

libname home "/Excel_Output";
libname s4a "/staging";

data industrycon_;
set s4a.bank_201907	(keep =	mainbank
mainbankid
bankstate
naics_sect_cd
naics_sector
balance
)
;
if balgrs=0 then delete;
run;

proc sql;
create table indcon_
as
select
distinct mainbank,
mainbankid as mainbankid,
bankstate as bankstate,
sum(balance) as balance_
from industrycon_
group by mainbank, mainbankid;
quit;

proc sql;
create table indconman
as
select
distinct mainbank,
mainbankid as mainbankid,
bankstate as bankstate,
sum(balance) as balance_
from industrycon_
where naics_sector="Manufacturing"
group by mainbank, mainbankid;
quit;

proc sql outobs=20;
create table test
as
select distinct
a.*,b.balgrs1_ as balgrs
from indconman as a
full join indcon_ as b
on a.mainbank = b.mainbank
order by balgrs desc;
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
Obsidian | Level 7

## Re: Calculating proportion of variable by another identifier

This should do want you want roughly.

** For each bank, find prop of total loans by sector **;
PROC SQL;
CREATE TABLE bank0 AS SELECT
a.*,
sum(balance) as indtot,
b.banktot,
sum(balance)/banktot as prop
FROM industrycon_ a
LEFT JOIN (select distinct mainbank, mainbank, sum(balance) as banktot from industrycon_ group by mainbank, mainbankid) b
ON a.mainbank=b.mainbank and a.mainbankid=b,mainbankid
GROUP BY a.mainbank, a.mainbankid, a.naics_sect_cd, a.naics_sector;
QUIT;

** Sort each bank by prop of loans **;
PROC SORT data=bank0;
BY mainbank mainbankid prop;
RUN;

** Get one record per bank with max prop **;
DATA bank1;
SET bank0;
BY mainbank mainbankid;
IF last.mainbankid;
RUN;

** Re-sort by max prop**;
PROC SORT data=bank1;
BY descending prop;
RUN;

** Find top 20 banks **;
DATA bank2;
SET bank1;
IF _n_>20 THEN DELETE;
RUN;

It does not account for ties, i.e. if a bank has loans exactly split between two industries or if 2 banks are tied on exactly the same proportion but if it is a huge data set it may not matter

2 REPLIES 2
Obsidian | Level 7

## Re: Calculating proportion of variable by another identifier

This should do want you want roughly.

** For each bank, find prop of total loans by sector **;
PROC SQL;
CREATE TABLE bank0 AS SELECT
a.*,
sum(balance) as indtot,
b.banktot,
sum(balance)/banktot as prop
FROM industrycon_ a
LEFT JOIN (select distinct mainbank, mainbank, sum(balance) as banktot from industrycon_ group by mainbank, mainbankid) b
ON a.mainbank=b.mainbank and a.mainbankid=b,mainbankid
GROUP BY a.mainbank, a.mainbankid, a.naics_sect_cd, a.naics_sector;
QUIT;

** Sort each bank by prop of loans **;
PROC SORT data=bank0;
BY mainbank mainbankid prop;
RUN;

** Get one record per bank with max prop **;
DATA bank1;
SET bank0;
BY mainbank mainbankid;
IF last.mainbankid;
RUN;

** Re-sort by max prop**;
PROC SORT data=bank1;
BY descending prop;
RUN;

** Find top 20 banks **;
DATA bank2;
SET bank1;
IF _n_>20 THEN DELETE;
RUN;

It does not account for ties, i.e. if a bank has loans exactly split between two industries or if 2 banks are tied on exactly the same proportion but if it is a huge data set it may not matter

Fluorite | Level 6

## Re: Calculating proportion of variable by another identifier

Thank you!! I edited it a little bit, but it gave me exactly what I needed. The dataset is massive. Millions and millions of rows.

Discussion stats
• 2 replies
• 1632 views
• 0 likes
• 2 in conversation