BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Financ3rn
Calcite | Level 5

Hi all,

 

I am trying to calculate a peer-bank Herfindahl–Hirschman Index (HHI) using bank loans each year. The HHI observation for each bank in my panel dataset will be the market concentration of all peer-banks, excluding the bank itself. Therefore, each bank-year observation will have a unique vale. I'm sure there is a macro that could do this, but I have not been able to create one that works. Here is a sample of my data showing the banks and loan amounts each year. Is there any macro that I could use to calculate this peer_hhi?

 

data loans;
  input year bank $ loan_amount;
  datalines;
2010	BankA	 100000 
2010	BankB	 150000 
2010	BankC	 200000 
2010	BankD	 120000 
2010	BankE	 1000000 
2011	BankA	 200000 
2011	BankB	 50000 
2011	BankC	 400000 
2011	BankD	 100000 
2011	BankE	 1500000 
;
run;

As an example, the variable I am looking for peer_hhi for BankA in 2010 would be calculated with the data for BankB, C, D, E in 2010. See below

 

peer_hhi for BankA in 2010:

Financ3rn_1-1687882525103.png

 

with the 1,470,000 calculated as the sum of all loans for BankB, C, D, E in 2010.

 

The final dataset I am trying to create would look like this:

yearbankloan_amountpeer_hhi
2010BankA              100,000          4,984
2010BankB              150,000          5,279
2010BankC              200,000          5,578
2010BankD              120,000          5,101
2010BankE           1,000,000          2,675
2011BankA              200,000          5,764
2011BankB                50,000          5,083
2011BankC              400,000          6,728
2011BankD              100,000          5,306
2011BankE           1,500,000          3,778

 

Thank you in advance for all your help. Please let me know if more information is needed.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

The formula you provided suggests the following identity:

 

      Peer_HHI    = (Total_of_squared_peer_loans)  x  100**2    / (Total_of_peer_loans)**2

which is equivalent to

      [(Total_of_all_squared_loans) - Current_bank_loan_squared ] x 100**2   / (Total_of_all_loans - Current_bank_loan)**2

 

 

So read each year twice, the first time to generate totals for all banks, the second time to adjust for the currrent bank.  Assuming your data is sorted by year, then this will work:

 

data loans;
  input year bank $ loan_amount;
  datalines;
2010	BankA	 100000 
2010	BankB	 150000 
2010	BankC	 200000 
2010	BankD	 120000 
2010	BankE	 1000000 
2011	BankA	 200000 
2011	BankB	 50000 
2011	BankC	 400000 
2011	BankD	 100000 
2011	BankE	 1500000 
run;

data want (drop=_:);
  set loans (in=firstpass)  loans (in=secondpass);
  by year;

  if firstpass then do;
    _all_loans_total + loan_amount;
    _all_loans_sum_of_squares + loan_amount**2;
  end;
  if secondpass;
  peer_hhi=(_all_loans_sum_of_squares-(loan_amount**2))*(100**2)/((_all_loans_total-loan_amount)**2);
  if last.year then call missing(of _:);
run;

 

If the data are not sorted, and sorting is too expensive, then you have a problem amenable to a two SAS step solution: (1) run a proc summary to create the yearly totals for all banks, (2) in a DATA step, read in the proc summary results into a hash lookup table, and then reread the original data, making adjustments for each bank.

 

 

I put a little thought into avoiding running a PROC SORT on sorted data prior to the code above.  Consider this data, which is sorted by year/bank/region.  

 

data loans;
  input year bank $ loan_amount;
  do region=1 to 3; output; end;
  datalines;
2010	BankA	 100000 
2010	BankB	 150000 
2010	BankC	 200000 
2010	BankD	 120000 
2010	BankE	 1000000 
2011	BankA	 200000 
2011	BankB	 50000 
2011	BankC	 400000 
2011	BankD	 100000 
2011	BankE	 1500000 
run;

%let class_vars=year region;

proc summary data=loans noprint nway;
  var loan_amount;
  class &class_vars ;
  output out=need (drop=_type_ _freq_) sum=_all_loans_total  uss=_all_loans_sum_of_squares ;
run;

data want (drop=_:);
  set loans  need (obs=0);
  if _n_=1 then do;
    declare hash h (dataset:'need');
      do _w=1 to countw("&class_vars");
        h.definekey(scan("&class_vars",_w));
      end;
      h.definekey('year');
      h.definedata(all:'Y');
      h.definedone();
  end;
  h.find();
  peer_hhi=(_all_loans_sum_of_squares-(loan_amount**2))*(100**2)/((_all_loans_total-loan_amount)**2);
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star

The formula you provided suggests the following identity:

 

      Peer_HHI    = (Total_of_squared_peer_loans)  x  100**2    / (Total_of_peer_loans)**2

which is equivalent to

      [(Total_of_all_squared_loans) - Current_bank_loan_squared ] x 100**2   / (Total_of_all_loans - Current_bank_loan)**2

 

 

So read each year twice, the first time to generate totals for all banks, the second time to adjust for the currrent bank.  Assuming your data is sorted by year, then this will work:

 

data loans;
  input year bank $ loan_amount;
  datalines;
2010	BankA	 100000 
2010	BankB	 150000 
2010	BankC	 200000 
2010	BankD	 120000 
2010	BankE	 1000000 
2011	BankA	 200000 
2011	BankB	 50000 
2011	BankC	 400000 
2011	BankD	 100000 
2011	BankE	 1500000 
run;

data want (drop=_:);
  set loans (in=firstpass)  loans (in=secondpass);
  by year;

  if firstpass then do;
    _all_loans_total + loan_amount;
    _all_loans_sum_of_squares + loan_amount**2;
  end;
  if secondpass;
  peer_hhi=(_all_loans_sum_of_squares-(loan_amount**2))*(100**2)/((_all_loans_total-loan_amount)**2);
  if last.year then call missing(of _:);
run;

 

If the data are not sorted, and sorting is too expensive, then you have a problem amenable to a two SAS step solution: (1) run a proc summary to create the yearly totals for all banks, (2) in a DATA step, read in the proc summary results into a hash lookup table, and then reread the original data, making adjustments for each bank.

 

 

I put a little thought into avoiding running a PROC SORT on sorted data prior to the code above.  Consider this data, which is sorted by year/bank/region.  

 

data loans;
  input year bank $ loan_amount;
  do region=1 to 3; output; end;
  datalines;
2010	BankA	 100000 
2010	BankB	 150000 
2010	BankC	 200000 
2010	BankD	 120000 
2010	BankE	 1000000 
2011	BankA	 200000 
2011	BankB	 50000 
2011	BankC	 400000 
2011	BankD	 100000 
2011	BankE	 1500000 
run;

%let class_vars=year region;

proc summary data=loans noprint nway;
  var loan_amount;
  class &class_vars ;
  output out=need (drop=_type_ _freq_) sum=_all_loans_total  uss=_all_loans_sum_of_squares ;
run;

data want (drop=_:);
  set loans  need (obs=0);
  if _n_=1 then do;
    declare hash h (dataset:'need');
      do _w=1 to countw("&class_vars");
        h.definekey(scan("&class_vars",_w));
      end;
      h.definekey('year');
      h.definedata(all:'Y');
      h.definedone();
  end;
  h.find();
  peer_hhi=(_all_loans_sum_of_squares-(loan_amount**2))*(100**2)/((_all_loans_total-loan_amount)**2);
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Financ3rn
Calcite | Level 5
This worked perfectly! Thank you for helping with this

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 626 views
  • 0 likes
  • 2 in conversation