Hi - I am looking at bank lending data by bank within different industries. I want to create a table that has the banks with the largest proportion of their loans in one industry. I am looking for the top 20. This is the code I have and I am really stuck on how to actually calculate the industry proportion of loans by bank - the one piece I need.
It all comes from one huge database.
libname home "/Excel_Output";
libname s4a "/staging";
data industrycon_;
set s4a.bank_201907 (keep = mainbank
mainbankid
bankstate
naics_sect_cd
naics_sector
balance
)
;
if balgrs=0 then delete;
run;
proc sql;
create table indcon_
as
select
distinct mainbank,
mainbankid as mainbankid,
bankstate as bankstate,
sum(balance) as balance_
from industrycon_
group by mainbank, mainbankid;
quit;
proc sql;
create table indconman
as
select
distinct mainbank,
mainbankid as mainbankid,
bankstate as bankstate,
sum(balance) as balance_
from industrycon_
where naics_sector="Manufacturing"
group by mainbank, mainbankid;
quit;
proc sql outobs=20;
create table test
as
select distinct
a.*,b.balgrs1_ as balgrs
from indconman as a
full join indcon_ as b
on a.mainbank = b.mainbank
order by balgrs desc;
quit;
This should do want you want roughly.
** For each bank, find prop of total loans by sector **;
PROC SQL;
CREATE TABLE bank0 AS SELECT
a.*,
sum(balance) as indtot,
b.banktot,
sum(balance)/banktot as prop
FROM industrycon_ a
LEFT JOIN (select distinct mainbank, mainbank, sum(balance) as banktot from industrycon_ group by mainbank, mainbankid) b
ON a.mainbank=b.mainbank and a.mainbankid=b,mainbankid
GROUP BY a.mainbank, a.mainbankid, a.naics_sect_cd, a.naics_sector;
QUIT;
** Sort each bank by prop of loans **;
PROC SORT data=bank0;
BY mainbank mainbankid prop;
RUN;
** Get one record per bank with max prop **;
DATA bank1;
SET bank0;
BY mainbank mainbankid;
IF last.mainbankid;
RUN;
** Re-sort by max prop**;
PROC SORT data=bank1;
BY descending prop;
RUN;
** Find top 20 banks **;
DATA bank2;
SET bank1;
IF _n_>20 THEN DELETE;
RUN;
It does not account for ties, i.e. if a bank has loans exactly split between two industries or if 2 banks are tied on exactly the same proportion but if it is a huge data set it may not matter
This should do want you want roughly.
** For each bank, find prop of total loans by sector **;
PROC SQL;
CREATE TABLE bank0 AS SELECT
a.*,
sum(balance) as indtot,
b.banktot,
sum(balance)/banktot as prop
FROM industrycon_ a
LEFT JOIN (select distinct mainbank, mainbank, sum(balance) as banktot from industrycon_ group by mainbank, mainbankid) b
ON a.mainbank=b.mainbank and a.mainbankid=b,mainbankid
GROUP BY a.mainbank, a.mainbankid, a.naics_sect_cd, a.naics_sector;
QUIT;
** Sort each bank by prop of loans **;
PROC SORT data=bank0;
BY mainbank mainbankid prop;
RUN;
** Get one record per bank with max prop **;
DATA bank1;
SET bank0;
BY mainbank mainbankid;
IF last.mainbankid;
RUN;
** Re-sort by max prop**;
PROC SORT data=bank1;
BY descending prop;
RUN;
** Find top 20 banks **;
DATA bank2;
SET bank1;
IF _n_>20 THEN DELETE;
RUN;
It does not account for ties, i.e. if a bank has loans exactly split between two industries or if 2 banks are tied on exactly the same proportion but if it is a huge data set it may not matter
Thank you!! I edited it a little bit, but it gave me exactly what I needed. The dataset is massive. Millions and millions of rows.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.