SAS Programming

Financ3rn · Posted 06-27-2023 04:03 PM

Hi all,

I am trying to calculate a peer-bank Herfindahl–Hirschman Index (HHI) using bank loans each year. The HHI observation for each bank in my panel dataset will be the market concentration of all peer-banks, excluding the bank itself. Therefore, each bank-year observation will have a unique vale. I'm sure there is a macro that could do this, but I have not been able to create one that works. Here is a sample of my data showing the banks and loan amounts each year. Is there any macro that I could use to calculate this peer_hhi?

data loans;
  input year bank $ loan_amount;
  datalines;
2010	BankA	 100000 
2010	BankB	 150000 
2010	BankC	 200000 
2010	BankD	 120000 
2010	BankE	 1000000 
2011	BankA	 200000 
2011	BankB	 50000 
2011	BankC	 400000 
2011	BankD	 100000 
2011	BankE	 1500000 
;
run;

As an example, the variable I am looking for peer_hhi for BankA in 2010 would be calculated with the data for BankB, C, D, E in 2010. See below

peer_hhi for BankA in 2010:

with the 1,470,000 calculated as the sum of all loans for BankB, C, D, E in 2010.

The final dataset I am trying to create would look like this:

year	bank	loan_amount	peer_hhi
2010	BankA	100,000	4,984
2010	BankB	150,000	5,279
2010	BankC	200,000	5,578
2010	BankD	120,000	5,101
2010	BankE	1,000,000	2,675
2011	BankA	200,000	5,764
2011	BankB	50,000	5,083
2011	BankC	400,000	6,728
2011	BankD	100,000	5,306
2011	BankE	1,500,000	3,778

Thank you in advance for all your help. Please let me know if more information is needed.

mkeintz · Posted 06-28-2023 05:32 PM

The formula you provided suggests the following identity:

Peer_HHI = (Total_of_squared_peer_loans) x 100**2 / (Total_of_peer_loans)**2

which is equivalent to

[(Total_of_all_squared_loans) - Current_bank_loan_squared ] x 100**2 / (Total_of_all_loans - Current_bank_loan)**2

So read each year twice, the first time to generate totals for all banks, the second time to adjust for the currrent bank. Assuming your data is sorted by year, then this will work:

data loans;
  input year bank $ loan_amount;
  datalines;
2010	BankA	 100000 
2010	BankB	 150000 
2010	BankC	 200000 
2010	BankD	 120000 
2010	BankE	 1000000 
2011	BankA	 200000 
2011	BankB	 50000 
2011	BankC	 400000 
2011	BankD	 100000 
2011	BankE	 1500000 
run;

data want (drop=_:);
  set loans (in=firstpass)  loans (in=secondpass);
  by year;

  if firstpass then do;
    _all_loans_total + loan_amount;
    _all_loans_sum_of_squares + loan_amount**2;
  end;
  if secondpass;
  peer_hhi=(_all_loans_sum_of_squares-(loan_amount**2))*(100**2)/((_all_loans_total-loan_amount)**2);
  if last.year then call missing(of _:);
run;

If the data are not sorted, and sorting is too expensive, then you have a problem amenable to a two SAS step solution: (1) run a proc summary to create the yearly totals for all banks, (2) in a DATA step, read in the proc summary results into a hash lookup table, and then reread the original data, making adjustments for each bank.

I put a little thought into avoiding running a PROC SORT on sorted data prior to the code above. Consider this data, which is sorted by year/bank/region.

data loans;
  input year bank $ loan_amount;
  do region=1 to 3; output; end;
  datalines;
2010	BankA	 100000 
2010	BankB	 150000 
2010	BankC	 200000 
2010	BankD	 120000 
2010	BankE	 1000000 
2011	BankA	 200000 
2011	BankB	 50000 
2011	BankC	 400000 
2011	BankD	 100000 
2011	BankE	 1500000 
run;

%let class_vars=year region;

proc summary data=loans noprint nway;
  var loan_amount;
  class &class_vars ;
  output out=need (drop=_type_ _freq_) sum=_all_loans_total  uss=_all_loans_sum_of_squares ;
run;

data want (drop=_:);
  set loans  need (obs=0);
  if _n_=1 then do;
    declare hash h (dataset:'need');
      do _w=1 to countw("&class_vars");
        h.definekey(scan("&class_vars",_w));
      end;
      h.definekey('year');
      h.definedata(all:'Y');
      h.definedone();
  end;
  h.find();
  peer_hhi=(_all_loans_sum_of_squares-(loan_amount**2))*(100**2)/((_all_loans_total-loan_amount)**2);
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 06-28-2023 05:32 PM

The formula you provided suggests the following identity:

Peer_HHI = (Total_of_squared_peer_loans) x 100**2 / (Total_of_peer_loans)**2

which is equivalent to

[(Total_of_all_squared_loans) - Current_bank_loan_squared ] x 100**2 / (Total_of_all_loans - Current_bank_loan)**2

So read each year twice, the first time to generate totals for all banks, the second time to adjust for the currrent bank. Assuming your data is sorted by year, then this will work:

data loans;
  input year bank $ loan_amount;
  datalines;
2010	BankA	 100000 
2010	BankB	 150000 
2010	BankC	 200000 
2010	BankD	 120000 
2010	BankE	 1000000 
2011	BankA	 200000 
2011	BankB	 50000 
2011	BankC	 400000 
2011	BankD	 100000 
2011	BankE	 1500000 
run;

data want (drop=_:);
  set loans (in=firstpass)  loans (in=secondpass);
  by year;

  if firstpass then do;
    _all_loans_total + loan_amount;
    _all_loans_sum_of_squares + loan_amount**2;
  end;
  if secondpass;
  peer_hhi=(_all_loans_sum_of_squares-(loan_amount**2))*(100**2)/((_all_loans_total-loan_amount)**2);
  if last.year then call missing(of _:);
run;

If the data are not sorted, and sorting is too expensive, then you have a problem amenable to a two SAS step solution: (1) run a proc summary to create the yearly totals for all banks, (2) in a DATA step, read in the proc summary results into a hash lookup table, and then reread the original data, making adjustments for each bank.

I put a little thought into avoiding running a PROC SORT on sorted data prior to the code above. Consider this data, which is sorted by year/bank/region.

data loans;
  input year bank $ loan_amount;
  do region=1 to 3; output; end;
  datalines;
2010	BankA	 100000 
2010	BankB	 150000 
2010	BankC	 200000 
2010	BankD	 120000 
2010	BankE	 1000000 
2011	BankA	 200000 
2011	BankB	 50000 
2011	BankC	 400000 
2011	BankD	 100000 
2011	BankE	 1500000 
run;

%let class_vars=year region;

proc summary data=loans noprint nway;
  var loan_amount;
  class &class_vars ;
  output out=need (drop=_type_ _freq_) sum=_all_loans_total  uss=_all_loans_sum_of_squares ;
run;

data want (drop=_:);
  set loans  need (obs=0);
  if _n_=1 then do;
    declare hash h (dataset:'need');
      do _w=1 to countw("&class_vars");
        h.definekey(scan("&class_vars",_w));
      end;
      h.definekey('year');
      h.definedata(all:'Y');
      h.definedone();
  end;
  h.find();
  peer_hhi=(_all_loans_sum_of_squares-(loan_amount**2))*(100**2)/((_all_loans_total-loan_amount)**2);
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Financ3rn · Posted 06-29-2023 04:08 PM

This worked perfectly! Thank you for helping with this

SAS Programming

Calculating peer-bank Herfindahl–Hirschman Index (HHI) using a macro in SAS 9.4

Re: Calculating peer-bank Herfindahl–Hirschman Index (HHI) using a macro in SAS 9.4

Re: Calculating peer-bank Herfindahl–Hirschman Index (HHI) using a macro in SAS 9.4

Re: Calculating peer-bank Herfindahl–Hirschman Index (HHI) using a macro in SAS 9.4

Follow Us

What is...

SAS Programming

Calculating peer-bank Herfindahl–Hirschman Index (HHI) using a macro in SAS 9.4

Re: Calculating peer-bank Herfindahl–Hirschman Index (HHI) using a macro in SAS 9.4

Re: Calculating peer-bank Herfindahl–Hirschman Index (HHI) using a macro in SAS 9.4

Re: Calculating peer-bank Herfindahl–Hirschman Index (HHI) using a macro in SAS 9.4

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...