- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi - I am looking at bank lending data by bank within different industries. I want to create a table that has the banks with the largest proportion of their loans in one industry. I am looking for the top 20. This is the code I have and I am really stuck on how to actually calculate the industry proportion of loans by bank - the one piece I need.
It all comes from one huge database.
libname home "/Excel_Output";
libname s4a "/staging";
data industrycon_;
set s4a.bank_201907 (keep = mainbank
mainbankid
bankstate
naics_sect_cd
naics_sector
balance
)
;
if balgrs=0 then delete;
run;
proc sql;
create table indcon_
as
select
distinct mainbank,
mainbankid as mainbankid,
bankstate as bankstate,
sum(balance) as balance_
from industrycon_
group by mainbank, mainbankid;
quit;
proc sql;
create table indconman
as
select
distinct mainbank,
mainbankid as mainbankid,
bankstate as bankstate,
sum(balance) as balance_
from industrycon_
where naics_sector="Manufacturing"
group by mainbank, mainbankid;
quit;
proc sql outobs=20;
create table test
as
select distinct
a.*,b.balgrs1_ as balgrs
from indconman as a
full join indcon_ as b
on a.mainbank = b.mainbank
order by balgrs desc;
quit;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This should do want you want roughly.
** For each bank, find prop of total loans by sector **;
PROC SQL;
CREATE TABLE bank0 AS SELECT
a.*,
sum(balance) as indtot,
b.banktot,
sum(balance)/banktot as prop
FROM industrycon_ a
LEFT JOIN (select distinct mainbank, mainbank, sum(balance) as banktot from industrycon_ group by mainbank, mainbankid) b
ON a.mainbank=b.mainbank and a.mainbankid=b,mainbankid
GROUP BY a.mainbank, a.mainbankid, a.naics_sect_cd, a.naics_sector;
QUIT;
** Sort each bank by prop of loans **;
PROC SORT data=bank0;
BY mainbank mainbankid prop;
RUN;
** Get one record per bank with max prop **;
DATA bank1;
SET bank0;
BY mainbank mainbankid;
IF last.mainbankid;
RUN;
** Re-sort by max prop**;
PROC SORT data=bank1;
BY descending prop;
RUN;
** Find top 20 banks **;
DATA bank2;
SET bank1;
IF _n_>20 THEN DELETE;
RUN;
It does not account for ties, i.e. if a bank has loans exactly split between two industries or if 2 banks are tied on exactly the same proportion but if it is a huge data set it may not matter
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This should do want you want roughly.
** For each bank, find prop of total loans by sector **;
PROC SQL;
CREATE TABLE bank0 AS SELECT
a.*,
sum(balance) as indtot,
b.banktot,
sum(balance)/banktot as prop
FROM industrycon_ a
LEFT JOIN (select distinct mainbank, mainbank, sum(balance) as banktot from industrycon_ group by mainbank, mainbankid) b
ON a.mainbank=b.mainbank and a.mainbankid=b,mainbankid
GROUP BY a.mainbank, a.mainbankid, a.naics_sect_cd, a.naics_sector;
QUIT;
** Sort each bank by prop of loans **;
PROC SORT data=bank0;
BY mainbank mainbankid prop;
RUN;
** Get one record per bank with max prop **;
DATA bank1;
SET bank0;
BY mainbank mainbankid;
IF last.mainbankid;
RUN;
** Re-sort by max prop**;
PROC SORT data=bank1;
BY descending prop;
RUN;
** Find top 20 banks **;
DATA bank2;
SET bank1;
IF _n_>20 THEN DELETE;
RUN;
It does not account for ties, i.e. if a bank has loans exactly split between two industries or if 2 banks are tied on exactly the same proportion but if it is a huge data set it may not matter
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you!! I edited it a little bit, but it gave me exactly what I needed. The dataset is massive. Millions and millions of rows.