Solved: Calculating proportion of variable by another identifier

claforest1 · Posted 09-11-2019 03:32 PM

Hi - I am looking at bank lending data by bank within different industries. I want to create a table that has the banks with the largest proportion of their loans in one industry. I am looking for the top 20. This is the code I have and I am really stuck on how to actually calculate the industry proportion of loans by bank - the one piece I need.

It all comes from one huge database.

libname home "/Excel_Output";
libname s4a "/staging";

data industrycon_;
	set s4a.bank_201907	(keep =	mainbank
					mainbankid
					bankstate
					naics_sect_cd
					naics_sector
					balance
					)
					;
	if balgrs=0 then delete;
run;

proc sql;
	create table indcon_
	as
	select
		distinct mainbank,
		mainbankid as mainbankid,
		bankstate as bankstate,
		sum(balance) as balance_
	from industrycon_
	group by mainbank, mainbankid;
quit;

proc sql;
	create table indconman
	as
	select
		distinct mainbank,
		mainbankid as mainbankid,
		bankstate as bankstate,
		sum(balance) as balance_
	from industrycon_
	where naics_sector="Manufacturing"
	group by mainbank, mainbankid;
quit;




proc sql outobs=20;
	create table test
	as
	select distinct
	a.*,b.balgrs1_ as balgrs
from indconman as a 
	full join indcon_ as b
	on a.mainbank = b.mainbank
	order by balgrs desc;
quit;

SwissC · Posted 09-12-2019 06:05 AM

This should do want you want roughly.

** For each bank, find prop of total loans by sector **;
PROC SQL;
  CREATE TABLE bank0 AS SELECT
  a.*, 
  sum(balance) as indtot, 
  b.banktot,
  sum(balance)/banktot as prop  
  FROM industrycon_ a
  LEFT JOIN (select distinct mainbank, mainbank, sum(balance) as banktot from industrycon_ group by mainbank, mainbankid) b
    ON a.mainbank=b.mainbank and a.mainbankid=b,mainbankid  
  GROUP BY a.mainbank, a.mainbankid, a.naics_sect_cd, a.naics_sector;
QUIT;

** Sort each bank by prop of loans **;
PROC SORT data=bank0;
  BY mainbank mainbankid prop;
RUN;

** Get one record per bank with max prop **;
DATA bank1;
  SET bank0;
  BY mainbank mainbankid;
  IF last.mainbankid;
RUN;

** Re-sort by max prop**;
PROC SORT data=bank1;
  BY descending prop;
RUN;

** Find top 20 banks **;
DATA bank2;
  SET bank1;
  IF _n_>20 THEN DELETE;
RUN;

It does not account for ties, i.e. if a bank has loans exactly split between two industries or if 2 banks are tied on exactly the same proportion but if it is a huge data set it may not matter

View solution in original post

SwissC · Posted 09-12-2019 06:05 AM

This should do want you want roughly.

** For each bank, find prop of total loans by sector **;
PROC SQL;
  CREATE TABLE bank0 AS SELECT
  a.*, 
  sum(balance) as indtot, 
  b.banktot,
  sum(balance)/banktot as prop  
  FROM industrycon_ a
  LEFT JOIN (select distinct mainbank, mainbank, sum(balance) as banktot from industrycon_ group by mainbank, mainbankid) b
    ON a.mainbank=b.mainbank and a.mainbankid=b,mainbankid  
  GROUP BY a.mainbank, a.mainbankid, a.naics_sect_cd, a.naics_sector;
QUIT;

** Sort each bank by prop of loans **;
PROC SORT data=bank0;
  BY mainbank mainbankid prop;
RUN;

** Get one record per bank with max prop **;
DATA bank1;
  SET bank0;
  BY mainbank mainbankid;
  IF last.mainbankid;
RUN;

** Re-sort by max prop**;
PROC SORT data=bank1;
  BY descending prop;
RUN;

** Find top 20 banks **;
DATA bank2;
  SET bank1;
  IF _n_>20 THEN DELETE;
RUN;

It does not account for ties, i.e. if a bank has loans exactly split between two industries or if 2 banks are tied on exactly the same proportion but if it is a huge data set it may not matter

claforest1 · Posted 09-12-2019 08:51 AM

Thank you!! I edited it a little bit, but it gave me exactly what I needed. The dataset is massive. Millions and millions of rows.

Calculating proportion of variable by another identifier

Re: Calculating proportion of variable by another identifier

Re: Calculating proportion of variable by another identifier

Re: Calculating proportion of variable by another identifier

Calculating proportion of variable by another identifier

Re: Calculating proportion of variable by another identifier

Re: Calculating proportion of variable by another identifier

Re: Calculating proportion of variable by another identifier

Click image to register for webinar

Classroom Training Available!