Hi all,
I am trying to calculate a peer-bank Herfindahl–Hirschman Index (HHI) using bank loans each year. The HHI observation for each bank in my panel dataset will be the market concentration of all peer-banks, excluding the bank itself. Therefore, each bank-year observation will have a unique vale. I'm sure there is a macro that could do this, but I have not been able to create one that works. Here is a sample of my data showing the banks and loan amounts each year. Is there any macro that I could use to calculate this peer_hhi?
data loans;
input year bank $ loan_amount;
datalines;
2010 BankA 100000
2010 BankB 150000
2010 BankC 200000
2010 BankD 120000
2010 BankE 1000000
2011 BankA 200000
2011 BankB 50000
2011 BankC 400000
2011 BankD 100000
2011 BankE 1500000
;
run;
As an example, the variable I am looking for peer_hhi for BankA in 2010 would be calculated with the data for BankB, C, D, E in 2010. See below
peer_hhi for BankA in 2010:
with the 1,470,000 calculated as the sum of all loans for BankB, C, D, E in 2010.
The final dataset I am trying to create would look like this:
year | bank | loan_amount | peer_hhi |
2010 | BankA | 100,000 | 4,984 |
2010 | BankB | 150,000 | 5,279 |
2010 | BankC | 200,000 | 5,578 |
2010 | BankD | 120,000 | 5,101 |
2010 | BankE | 1,000,000 | 2,675 |
2011 | BankA | 200,000 | 5,764 |
2011 | BankB | 50,000 | 5,083 |
2011 | BankC | 400,000 | 6,728 |
2011 | BankD | 100,000 | 5,306 |
2011 | BankE | 1,500,000 | 3,778 |
Thank you in advance for all your help. Please let me know if more information is needed.
The formula you provided suggests the following identity:
Peer_HHI = (Total_of_squared_peer_loans) x 100**2 / (Total_of_peer_loans)**2
which is equivalent to
[(Total_of_all_squared_loans) - Current_bank_loan_squared ] x 100**2 / (Total_of_all_loans - Current_bank_loan)**2
So read each year twice, the first time to generate totals for all banks, the second time to adjust for the currrent bank. Assuming your data is sorted by year, then this will work:
data loans;
input year bank $ loan_amount;
datalines;
2010 BankA 100000
2010 BankB 150000
2010 BankC 200000
2010 BankD 120000
2010 BankE 1000000
2011 BankA 200000
2011 BankB 50000
2011 BankC 400000
2011 BankD 100000
2011 BankE 1500000
run;
data want (drop=_:);
set loans (in=firstpass) loans (in=secondpass);
by year;
if firstpass then do;
_all_loans_total + loan_amount;
_all_loans_sum_of_squares + loan_amount**2;
end;
if secondpass;
peer_hhi=(_all_loans_sum_of_squares-(loan_amount**2))*(100**2)/((_all_loans_total-loan_amount)**2);
if last.year then call missing(of _:);
run;
If the data are not sorted, and sorting is too expensive, then you have a problem amenable to a two SAS step solution: (1) run a proc summary to create the yearly totals for all banks, (2) in a DATA step, read in the proc summary results into a hash lookup table, and then reread the original data, making adjustments for each bank.
I put a little thought into avoiding running a PROC SORT on sorted data prior to the code above. Consider this data, which is sorted by year/bank/region.
data loans;
input year bank $ loan_amount;
do region=1 to 3; output; end;
datalines;
2010 BankA 100000
2010 BankB 150000
2010 BankC 200000
2010 BankD 120000
2010 BankE 1000000
2011 BankA 200000
2011 BankB 50000
2011 BankC 400000
2011 BankD 100000
2011 BankE 1500000
run;
%let class_vars=year region;
proc summary data=loans noprint nway;
var loan_amount;
class &class_vars ;
output out=need (drop=_type_ _freq_) sum=_all_loans_total uss=_all_loans_sum_of_squares ;
run;
data want (drop=_:);
set loans need (obs=0);
if _n_=1 then do;
declare hash h (dataset:'need');
do _w=1 to countw("&class_vars");
h.definekey(scan("&class_vars",_w));
end;
h.definekey('year');
h.definedata(all:'Y');
h.definedone();
end;
h.find();
peer_hhi=(_all_loans_sum_of_squares-(loan_amount**2))*(100**2)/((_all_loans_total-loan_amount)**2);
run;
The formula you provided suggests the following identity:
Peer_HHI = (Total_of_squared_peer_loans) x 100**2 / (Total_of_peer_loans)**2
which is equivalent to
[(Total_of_all_squared_loans) - Current_bank_loan_squared ] x 100**2 / (Total_of_all_loans - Current_bank_loan)**2
So read each year twice, the first time to generate totals for all banks, the second time to adjust for the currrent bank. Assuming your data is sorted by year, then this will work:
data loans;
input year bank $ loan_amount;
datalines;
2010 BankA 100000
2010 BankB 150000
2010 BankC 200000
2010 BankD 120000
2010 BankE 1000000
2011 BankA 200000
2011 BankB 50000
2011 BankC 400000
2011 BankD 100000
2011 BankE 1500000
run;
data want (drop=_:);
set loans (in=firstpass) loans (in=secondpass);
by year;
if firstpass then do;
_all_loans_total + loan_amount;
_all_loans_sum_of_squares + loan_amount**2;
end;
if secondpass;
peer_hhi=(_all_loans_sum_of_squares-(loan_amount**2))*(100**2)/((_all_loans_total-loan_amount)**2);
if last.year then call missing(of _:);
run;
If the data are not sorted, and sorting is too expensive, then you have a problem amenable to a two SAS step solution: (1) run a proc summary to create the yearly totals for all banks, (2) in a DATA step, read in the proc summary results into a hash lookup table, and then reread the original data, making adjustments for each bank.
I put a little thought into avoiding running a PROC SORT on sorted data prior to the code above. Consider this data, which is sorted by year/bank/region.
data loans;
input year bank $ loan_amount;
do region=1 to 3; output; end;
datalines;
2010 BankA 100000
2010 BankB 150000
2010 BankC 200000
2010 BankD 120000
2010 BankE 1000000
2011 BankA 200000
2011 BankB 50000
2011 BankC 400000
2011 BankD 100000
2011 BankE 1500000
run;
%let class_vars=year region;
proc summary data=loans noprint nway;
var loan_amount;
class &class_vars ;
output out=need (drop=_type_ _freq_) sum=_all_loans_total uss=_all_loans_sum_of_squares ;
run;
data want (drop=_:);
set loans need (obs=0);
if _n_=1 then do;
declare hash h (dataset:'need');
do _w=1 to countw("&class_vars");
h.definekey(scan("&class_vars",_w));
end;
h.definekey('year');
h.definedata(all:'Y');
h.definedone();
end;
h.find();
peer_hhi=(_all_loans_sum_of_squares-(loan_amount**2))*(100**2)/((_all_loans_total-loan_amount)**2);
run;
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.