BookmarkSubscribeRSS Feed
Phil_NZ
Barite | Level 11

Hi SAS Users,

 

Today I try to perform the HHI index calculation, I also read some documents from other topics (Topic 1, Topic 2) in this forum about  HHI calculation but I am still not able to apply these solutions to my case, could you please give me a hint to sort it out?

 

I also attach the dataset "canus_sample" under sas7bdat type below, a quick description is

TYPE	INDC3	GEOGN	YEAR	wACC_PAY	wMAR_TO_BOO	    wNET_SAL	wRETURN	     wSAL_GRO
719598	AUTMB	CANADA	1988	11027	    1.5294117647	94676	    .	         .
912664	AUTMB	CANADA	1988	.	        0.8048157351	13331775	.	         .
922180	AUTMB	CANADA	1988	.	        1.4089254416	476973	    .	         .
933574	AUTMB	CANADA	1988	.	        1.0194213877	212008	    .	         .
982028	AUTMB	CANADA	1988	.	        0.6533412888	1219683	    .	         .
510135	AUTMB	UNITEDS	1988	5377	    1.4341150666	94403	    .	         .
729665	AUTMB	UNITEDS	1988	.	        0.9192307692	757378	    .	         .
740787	AUTMB	UNITEDS	1988	.	        0.9102109965	259931	    .	         .
719598	AUTMB	CANADA	1989	11305	    1.1956521739	98336	   -0.153846154	 3.8658160463
912664	AUTMB	CANADA	1989	992489	    0.9512786263	13241091	0.368846756	-0.680209499
922180	AUTMB	CANADA	1989	22463	    1.5847339821	427928	    0.2	        -10.28255268

Among them, "TYPE', INDC3", "GEOGN", and "YEAR" are character variables while the rest are numeric variables.

INDC3: an indicator of industry

Type: an indicator of firms

So, I want to calculate the HHI index for each industry for each year for the whole dataset (regardless of the countries (that's why the data has been sorted by INDC3 and year already).

 

How to calculate the HHI index:

In each industry (INDC3), we sum up all wNET_SAL for each year. Afterward, we need to divide the wNET_SAL of each firm in this industry this year for the total wNET_SAL of this industry of this year, we call this result is wNET_SAL_OVE_IND. Lastly, we sum up the square of wNET_SAL_OVE_IND of all firms in this industry in this year and it is the HHI for this industry in this year.

 

For example, I am going to calculate HHI for industry "AUTMB" in year 1989

TYPE	INDC3	GEOGN	YEAR	wACC_PAY	wMAR_TO_BOO  	wNET_SAL
719598	AUTMB	CANADA	1989	11305	    1.1956521739	98336
912664	AUTMB	CANADA	1989	992489	    0.9512786263	13241091
922180	AUTMB	CANADA	1989	22463	    1.5847339821	427928
933574	AUTMB	CANADA	1989	20732	    1.0124768462	249152
982028	AUTMB	CANADA	1989	217887	    0.4379157428	1663509
510135	AUTMB	UNITEDS	1989	9253	    2.2179974651	136135
729665	AUTMB	UNITEDS	1989	40095	    1.4135472371	790967
740787	AUTMB	UNITEDS	1989	.	        1.6622800942	277114

First, I sum up all wNET_SAL of all firms in industry AUTMB in 1989 => sum_wNET_SAL= 98336+ 13241091+...+ 277114 =16884232

Then I divide the wNET_SAL of each firm in this industry this year for the total wNET_SAL of this industry of this year, we call this result is wNET_SAL_OVE_IND:

TYPE           wNET_SAL_OVE_IND
719598         0.00582
912664         0.78423
922180         0.0253
933574         0.01476
982028         0.0985
510135         0.0081
729665         0.0468
740787         0.01641

Then we square wNET_SAL_OVE_IND for all firms in this industry in this year then sum up together to have the HHI for this industry in this year. So HHI of the industry AUTMB in the year 1989 equal to 0.00582^2 + 0.07842^2+.....+0.01641^2

 

This is the list of INDC3 : 

INDC3
AUTMB BRESR CHMCL CNSTM CONPS ENEGY FDBEV HLTHC INDGS MEDIA NA PCDGS RTAIL TECNO TELCM TRLES UNCLS UQEQS
EDIT: I just code it and I have the code like that
* calculate market size by industry , year;
proc sql;
create table work.herfindahl1 as
  select Type, year, INDC3, wNET_SAL, sum(wNET_SAL) as market_size, count(INDC3) as firm_num
  from canus_sample
  group by INDC3,year
  ;
quit;

*get the market share squares;
data work.herfindahl2;
set work.herfindahl1;
if market_size = 0 or market_size = . then delete;
market_share_sqr = (wNET_SAL/market_size)**2;
run;

*calculate the herfindahl index;
proc sql;
create table work.herfindahl3 as
  select Type, year, INDC3, wNET_SAL, firm_num, market_size, market_share_sqr, sum(market_share_sqr) as Herfindahl
  from work.herfindahl2
  group by year, INDC3
  ;
quit;
   

proc sql;
create table work.herfindahl3 as
  select year, INDC3, sum(market_share_sqr) as Herfindahl
  from work.herfindahl2
  group by year, INDC3
  ;
quit;
And the result is 
My97_0-1613531929946.png

There are 3 questions here:

1. I am just wondering if there is any hidden error that I did not notice with the code above

 

2. Apart from that, regarding the code

group by INDC3,year

in PROC SQL, so if we use it, whether we need to proc sort the initial data by INDC3 and year? From this document, it says that I do not need to Proc Sort before doing SQL, I am just prudent about my code to make sure I do not make mistakes at this stage.

 

3. If the code above is correct, what should I do to set Herfindahl =.  if Herfindahl=1 or INDC3 in('UNCLS', 'UQEQS', 'NA')
 
For example, these observation should receive value '.'
 
My97_0-1613533208512.png

 

Many thanks and warmest regards.

 

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
5 REPLIES 5
ChrisNZ
Tourmaline | Level 20

Like this?

proc summary data=SASHELP.CLASS (rename=(SEX=INDUSTRY AGE=YEAR NAME=TYPE WEIGHT=SALES));
  class INDUSTRY
        YEAR   
        TYPE;
  var SALES;
  types INDUSTRY * YEAR 
        INDUSTRY * YEAR * TYPE ;
  output out=SUMS sum=;
run;

data WANT;
  merge SUMS(where=(_TYPE_=7) )
        SUMS(where=(_TYPE_=6) rename=(SALES=TOTAL));
  by INDUSTRY YEAR ;
  RATIO = SALES / TOTAL;
  HHI + RATIO**2;
  if last.YEAR then do;
    output; 
    HHI=0;
  end;
  keep INDUSTRY YEAR HHI; 
run;

 

Phil_NZ
Barite | Level 11

Hi @ChrisNZ 

Many thanks for your prompt reply, but it seems not what I am doing. 

 

I want to say sorry that I just updated my code for solving my problem above, could you please have a look and see if there is any hidden error that I did not notice.

 

Warmest regards.

P/S: The log is as below

28         * calculate market size by industry , year;
29         proc sql;
30         create table work.herfindahl1 as
31           select Type, year, INDC3, wNET_SAL, sum(wNET_SAL) as market_size, count(INDC3) as firm_num
32           from canus_sample
33           group by INDC3,year
34           ;
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.HERFINDAHL1 created, with 62000 rows and 6 columns.

35         quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds
      

36         
37         *get the market share squares;
38         data work.herfindahl2;
39         set work.herfindahl1;
40         if market_size = 0 or market_size = . then delete;
41         market_share_sqr = (wNET_SAL/market_size)**2;
42         run;

NOTE: There were 62000 observations read from the data set WORK.HERFINDAHL1.
NOTE: The data set WORK.HERFINDAHL2 has 62000 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
      
2                                                          The SAS System                         15:35 Wednesday, February 17, 2021


43         
44         *calculate the herfindahl index;
45         proc sql;
46         create table work.herfindahl3 as
47           select Type, year, INDC3, wNET_SAL, firm_num, market_size, market_share_sqr, sum(market_share_sqr) as Herfindahl
48           from work.herfindahl2
49           group by year, INDC3
50           ;
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.HERFINDAHL3 created, with 62000 rows and 8 columns.

51         quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.04 seconds
      cpu time            0.07 seconds
      

52         
53         
54         proc sql;
55         create table work.herfindahl3 as
56           select year, INDC3, sum(market_share_sqr) as Herfindahl
57           from work.herfindahl2
58           group by year, INDC3
59           ;
NOTE: Table WORK.HERFINDAHL3 created, with 491 rows and 3 columns.

60         quit;
Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
mkeintz
PROC Star

Here is a macro I wrote to calculate and report "Herfindahl-Hirschman Index Values for user-specified categories".  It takes advantage of the fact that proc summary can generate in one pass, not only sums of analysis variables, but also uncorrected sums of squares, for each cell (and row or column total and grand total) in a user-specified classification scheme.  Then  HHI is the uncorrected sum-of-squares/ square of the sum.

 

A sample use of it is (generate hhi for actual and predict, classified by year*prodtype, for each country).

 

proc sort data=sashelp.prdsale out=have;
  by country;
run;
%hhi_index(indsn=sashelp.prdsale,outdsn=want,report=yes,hhvars=actual predict,byvars=country,classvars=year prodtype );

 

In your case you want the HHI index for wnet_sal for each industry/year combination: 

%macro hhi_index(indsn=canus_sample,outdsn=want,hhvars=wnet_sal,classvars=indc3 year,report=yes);

I have validated this for a couple of datasets, but I haven't given it a complete torture test yet, so check results you get with this code.  Note it will give HHI for each year/INDC3 combination, as well as for each year, each indc3, and overall.  Also note that if you specify "report=yes", it doesn't take more than two classification variables at a time.

 

%macro hhi_index(indsn=_last_,outdsn=,hhvars=,byvars=,classvars=%str( ),report=NO)
  /des="Generate Herfindahl-Hirschman Index Values for user-specified categories" ;

  %local
     /*Mandatory Arguments */
       hhvars     /*REQUIRED:     The analysis variable(s) for HH indices                   */

	 /*Optional Arguments  */
       indsn      /*Source dataset name  (Default=_LAST_).                                  */
       outdsn     /*Destination dataset  (Default is blank). If blank, generate report only */
       byvars     /*By variable names  (Default: no by variables)                           */
       classvars  /*Subpopulation identifiers  (Default: no class vars)                     */
                  /*No more than 2 classvars allowed                                        */
       report     /*Report=YES means generate a report,  (default report=NO)                */

	 /*Created within the macro */
       hhinames   /*Names for new index variables corresponding to HHVARS                   */
       h hvar     /*For looping overs hhvars & hhinames                                     */
	   n_hhvars   /*Number of hhvars                                                        */
	   n_class    /*Number of classvars                                                     */
	   DSNLABEL   /*Label for the output dataset                                            */
     ;

  %let hhvars=%upcase(&hhvars);
  %let n_hhvars=%sysfunc(countw(&hhvars));
  %let byvars=%upcase(&byvars);
  %let classvars=%upcase(&classvars);
  %let n_class=%sysfunc(countw(%str(&classvars)));
  %let report=%upcase(&report);

  /* Build HHINAMES list corresponding to HHVARS (append "_HHI" to each original name */
  %do h=1 %to &n_hhvars;
    %let hvar=%scan(&hhvars,&h,%str( ));
    %if &h=1 %then %let hhinames=&hvar._HHI ;
    %else %let hhinames=&hhinames &hvar._HHI;
  %end;
  %put ********* &=hhinames ************;

  /* Get aggregate statistics (Uncorrected sum-of-squares and sums)  needed for HH index calculation  */
  proc summary data=&indsn chartype threads ;
    by &byvars;
    class &classvars;
    vars &hhvars ;
    output out=_need (label="Components needed for HH Index calculations")
           N= SUM= USS= MAX=/ autoname ways ;
  run;

  /* Sort row, col totals to FOLLOW individual rows and columns, for later use by proc report */
  proc sort data=_need; by &byvars descending _way_; run;


  %let DSNLABEL=HHI indices for variable(s) &hhvars., classified by (&CLASSVARS.);
  %if %length(&byvars)>0 %then %let DSNLABEL=&DSNLABEL., with byvars=&byvars.;
  %let DSNLABEL=(label="&DSNLABEL.");
  %put ********* &=DSNLABEL ********;

  /* If no OUTDSN specified then just build a dataset VIEW, and request a report generation  */
  %if &outdsn=%str() %then %do;
     %let OUTDSN=_VNEED ;
     %let report=YES;
  %end;


  /* Calculate the HH indices = uncorrected sum-of-squares / square of the sum   */
  data &outdsn &DSNLABEL
    %if &outdsn=_VNEED %then / view=_VNEED ;
  ;
    set _need;
	%do h=1 %to &n_hhvars;
      %let hvar=%scan(&hhvars,&h,%str( ));
      &hvar._HHI = &hvar._USS / &hvar._SUM**2;
      label &hvar._HHI = "HHI for &HVAR" ;
    %end;
  run;

  %if &report= YES %then %do;
    proc format ;   /* For use in PROC REPORT below */
      value dotall      .='Total';
	  value $blankall ' '='Total';
    run;

  /* Report layout depends on number of classvars */
    proc report data=_LAST_ ;
	  title1 "Herfindahl-Hirschman Index for %scan(&hhvars,1)" 
        %if &n_hhvars=2 %then " and %scan(&HHVARS,2)";
		%if %length(&byvars)>0 %then ", by &byvars (Page)";
      ;
      %if %length(&classvars)>0 %then 
        title2 "Categorized by %scan(&classvars,1)" %if &n_class=2 %then " (row) and %scan(&classvars,2) (col)";;;

      by &byvars;

      %if &n_class=2 %then

        column %scan(&classvars,1)  (&hhinames),%scan(&classvars,2),sum ;
  	  %else 
        column &classvars  (&hhinames),sum;
      ;
	
      %if &n_class=2 %then 
        define %scan(&classvars,2) / '' across order=data missing  ;
      ;
      %if &n_class>0 %then 
        define %scan(&classvars,1) / '' group  order=data  missing ;
      ;

      %do h=1 %to &n_hhvars;
        define %scan(&hhinames,&h,%str( )) /analysis  format=7.3 ;
      %end;
      define sum / '';
	  format _numeric_ dotall32. ;
	  format _character_ $blankall. ;
    run;
	title1;  /* Clear Titles */
  %end;

%mend hhi_index;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Phil_NZ
Barite | Level 11

Hi @mkeintz 

 

 

Many thanks for your dedicated code, I will have a look and may have some questions when reading the code. However, could you please have a look at my code above (along with some problems I am facing) and verify it in my case? I know my code is in a particular case and not comprehensive but sometimes it is simple enough and boosts my confidence before coming to your code (a little bit complicated at the first glance, lol).

 

Warmest regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
mkeintz
PROC Star

Here is the core of the code I provided, applied to sashelp.prdsale:  

 

proc summary data=SASHELP.prdsale;
  class prodtype  YEAR   ;
  var actual predict;
  output out=SUMS sum= uss=/ autoname ways;
run;
data hhi_values;
  set sums;
  hhi_actual=actual_uss/(actual_sum**2);
  hhi_predict=predict_uss/(predict_sum**2);
run;

Most of the code I supplied earlier was written to

  1. format the (optional) PROC REPORT
  2. make the analysis variables and their hhi derivatives into macro parameters.

 

Note the "autoname" option in proc summary is quite useful, since it establishes a naming pattern that concatenates the analysis variable names (ACTUAL and PREDICT here) with an underscore and the statistic name (SUM and USS here).  So you know exactly what names to use in the subsequent data step.  I.e. from ACTUAL, the above proc summary creates   ACTUAL_SUM and ACTUAL_USS.

 

Also, proc summary generates the statistics not only for each cell identified by product/year, but also for each year for all products, each product for all years, and for all years & products.  This allows you to see the HHI decomposition by category.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 3494 views
  • 3 likes
  • 3 in conversation