- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi SAS Users,
Today I try to perform the HHI index calculation, I also read some documents from other topics (Topic 1, Topic 2) in this forum about HHI calculation but I am still not able to apply these solutions to my case, could you please give me a hint to sort it out?
I also attach the dataset "canus_sample" under sas7bdat type below, a quick description is
TYPE INDC3 GEOGN YEAR wACC_PAY wMAR_TO_BOO wNET_SAL wRETURN wSAL_GRO
719598 AUTMB CANADA 1988 11027 1.5294117647 94676 . .
912664 AUTMB CANADA 1988 . 0.8048157351 13331775 . .
922180 AUTMB CANADA 1988 . 1.4089254416 476973 . .
933574 AUTMB CANADA 1988 . 1.0194213877 212008 . .
982028 AUTMB CANADA 1988 . 0.6533412888 1219683 . .
510135 AUTMB UNITEDS 1988 5377 1.4341150666 94403 . .
729665 AUTMB UNITEDS 1988 . 0.9192307692 757378 . .
740787 AUTMB UNITEDS 1988 . 0.9102109965 259931 . .
719598 AUTMB CANADA 1989 11305 1.1956521739 98336 -0.153846154 3.8658160463
912664 AUTMB CANADA 1989 992489 0.9512786263 13241091 0.368846756 -0.680209499
922180 AUTMB CANADA 1989 22463 1.5847339821 427928 0.2 -10.28255268
Among them, "TYPE', INDC3", "GEOGN", and "YEAR" are character variables while the rest are numeric variables.
INDC3: an indicator of industry
Type: an indicator of firms
So, I want to calculate the HHI index for each industry for each year for the whole dataset (regardless of the countries (that's why the data has been sorted by INDC3 and year already).
How to calculate the HHI index:
In each industry (INDC3), we sum up all wNET_SAL for each year. Afterward, we need to divide the wNET_SAL of each firm in this industry this year for the total wNET_SAL of this industry of this year, we call this result is wNET_SAL_OVE_IND. Lastly, we sum up the square of wNET_SAL_OVE_IND of all firms in this industry in this year and it is the HHI for this industry in this year.
For example, I am going to calculate HHI for industry "AUTMB" in year 1989
TYPE INDC3 GEOGN YEAR wACC_PAY wMAR_TO_BOO wNET_SAL
719598 AUTMB CANADA 1989 11305 1.1956521739 98336
912664 AUTMB CANADA 1989 992489 0.9512786263 13241091
922180 AUTMB CANADA 1989 22463 1.5847339821 427928
933574 AUTMB CANADA 1989 20732 1.0124768462 249152
982028 AUTMB CANADA 1989 217887 0.4379157428 1663509
510135 AUTMB UNITEDS 1989 9253 2.2179974651 136135
729665 AUTMB UNITEDS 1989 40095 1.4135472371 790967
740787 AUTMB UNITEDS 1989 . 1.6622800942 277114
First, I sum up all wNET_SAL of all firms in industry AUTMB in 1989 => sum_wNET_SAL= 98336+ 13241091+...+ 277114 =16884232
Then I divide the wNET_SAL of each firm in this industry this year for the total wNET_SAL of this industry of this year, we call this result is wNET_SAL_OVE_IND:
TYPE wNET_SAL_OVE_IND
719598 0.00582
912664 0.78423
922180 0.0253
933574 0.01476
982028 0.0985
510135 0.0081
729665 0.0468
740787 0.01641
Then we square wNET_SAL_OVE_IND for all firms in this industry in this year then sum up together to have the HHI for this industry in this year. So HHI of the industry AUTMB in the year 1989 equal to 0.00582^2 + 0.07842^2+.....+0.01641^2
This is the list of INDC3 :
INDC3
AUTMB
BRESR
CHMCL
CNSTM
CONPS
ENEGY
FDBEV
HLTHC
INDGS
MEDIA
NA
PCDGS
RTAIL
TECNO
TELCM
TRLES
UNCLS
UQEQS
EDIT: I just code it and I have the code like that* calculate market size by industry , year;
proc sql;
create table work.herfindahl1 as
select Type, year, INDC3, wNET_SAL, sum(wNET_SAL) as market_size, count(INDC3) as firm_num
from canus_sample
group by INDC3,year
;
quit;
*get the market share squares;
data work.herfindahl2;
set work.herfindahl1;
if market_size = 0 or market_size = . then delete;
market_share_sqr = (wNET_SAL/market_size)**2;
run;
*calculate the herfindahl index;
proc sql;
create table work.herfindahl3 as
select Type, year, INDC3, wNET_SAL, firm_num, market_size, market_share_sqr, sum(market_share_sqr) as Herfindahl
from work.herfindahl2
group by year, INDC3
;
quit;
proc sql;
create table work.herfindahl3 as
select year, INDC3, sum(market_share_sqr) as Herfindahl
from work.herfindahl2
group by year, INDC3
;
quit;
And the result is There are 3 questions here:
1. I am just wondering if there is any hidden error that I did not notice with the code above
2. Apart from that, regarding the code
group by INDC3,year
in PROC SQL, so if we use it, whether we need to proc sort the initial data by INDC3 and year? From this document, it says that I do not need to Proc Sort before doing SQL, I am just prudent about my code to make sure I do not make mistakes at this stage.
'UNCLS', 'UQEQS', 'NA'
)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like this?
proc summary data=SASHELP.CLASS (rename=(SEX=INDUSTRY AGE=YEAR NAME=TYPE WEIGHT=SALES));
class INDUSTRY
YEAR
TYPE;
var SALES;
types INDUSTRY * YEAR
INDUSTRY * YEAR * TYPE ;
output out=SUMS sum=;
run;
data WANT;
merge SUMS(where=(_TYPE_=7) )
SUMS(where=(_TYPE_=6) rename=(SALES=TOTAL));
by INDUSTRY YEAR ;
RATIO = SALES / TOTAL;
HHI + RATIO**2;
if last.YEAR then do;
output;
HHI=0;
end;
keep INDUSTRY YEAR HHI;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @ChrisNZ
Many thanks for your prompt reply, but it seems not what I am doing.
I want to say sorry that I just updated my code for solving my problem above, could you please have a look and see if there is any hidden error that I did not notice.
Warmest regards.
P/S: The log is as below
28 * calculate market size by industry , year;
29 proc sql;
30 create table work.herfindahl1 as
31 select Type, year, INDC3, wNET_SAL, sum(wNET_SAL) as market_size, count(INDC3) as firm_num
32 from canus_sample
33 group by INDC3,year
34 ;
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.HERFINDAHL1 created, with 62000 rows and 6 columns.
35 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
36
37 *get the market share squares;
38 data work.herfindahl2;
39 set work.herfindahl1;
40 if market_size = 0 or market_size = . then delete;
41 market_share_sqr = (wNET_SAL/market_size)**2;
42 run;
NOTE: There were 62000 observations read from the data set WORK.HERFINDAHL1.
NOTE: The data set WORK.HERFINDAHL2 has 62000 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
2 The SAS System 15:35 Wednesday, February 17, 2021
43
44 *calculate the herfindahl index;
45 proc sql;
46 create table work.herfindahl3 as
47 select Type, year, INDC3, wNET_SAL, firm_num, market_size, market_share_sqr, sum(market_share_sqr) as Herfindahl
48 from work.herfindahl2
49 group by year, INDC3
50 ;
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.HERFINDAHL3 created, with 62000 rows and 8 columns.
51 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.04 seconds
cpu time 0.07 seconds
52
53
54 proc sql;
55 create table work.herfindahl3 as
56 select year, INDC3, sum(market_share_sqr) as Herfindahl
57 from work.herfindahl2
58 group by year, INDC3
59 ;
NOTE: Table WORK.HERFINDAHL3 created, with 491 rows and 3 columns.
60 quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is a macro I wrote to calculate and report "Herfindahl-Hirschman Index Values for user-specified categories". It takes advantage of the fact that proc summary can generate in one pass, not only sums of analysis variables, but also uncorrected sums of squares, for each cell (and row or column total and grand total) in a user-specified classification scheme. Then HHI is the uncorrected sum-of-squares/ square of the sum.
A sample use of it is (generate hhi for actual and predict, classified by year*prodtype, for each country).
proc sort data=sashelp.prdsale out=have;
by country;
run;
%hhi_index(indsn=sashelp.prdsale,outdsn=want,report=yes,hhvars=actual predict,byvars=country,classvars=year prodtype );
In your case you want the HHI index for wnet_sal for each industry/year combination:
%macro hhi_index(indsn=canus_sample,outdsn=want,hhvars=wnet_sal,classvars=indc3 year,report=yes);
I have validated this for a couple of datasets, but I haven't given it a complete torture test yet, so check results you get with this code. Note it will give HHI for each year/INDC3 combination, as well as for each year, each indc3, and overall. Also note that if you specify "report=yes", it doesn't take more than two classification variables at a time.
%macro hhi_index(indsn=_last_,outdsn=,hhvars=,byvars=,classvars=%str( ),report=NO)
/des="Generate Herfindahl-Hirschman Index Values for user-specified categories" ;
%local
/*Mandatory Arguments */
hhvars /*REQUIRED: The analysis variable(s) for HH indices */
/*Optional Arguments */
indsn /*Source dataset name (Default=_LAST_). */
outdsn /*Destination dataset (Default is blank). If blank, generate report only */
byvars /*By variable names (Default: no by variables) */
classvars /*Subpopulation identifiers (Default: no class vars) */
/*No more than 2 classvars allowed */
report /*Report=YES means generate a report, (default report=NO) */
/*Created within the macro */
hhinames /*Names for new index variables corresponding to HHVARS */
h hvar /*For looping overs hhvars & hhinames */
n_hhvars /*Number of hhvars */
n_class /*Number of classvars */
DSNLABEL /*Label for the output dataset */
;
%let hhvars=%upcase(&hhvars);
%let n_hhvars=%sysfunc(countw(&hhvars));
%let byvars=%upcase(&byvars);
%let classvars=%upcase(&classvars);
%let n_class=%sysfunc(countw(%str(&classvars)));
%let report=%upcase(&report);
/* Build HHINAMES list corresponding to HHVARS (append "_HHI" to each original name */
%do h=1 %to &n_hhvars;
%let hvar=%scan(&hhvars,&h,%str( ));
%if &h=1 %then %let hhinames=&hvar._HHI ;
%else %let hhinames=&hhinames &hvar._HHI;
%end;
%put ********* &=hhinames ************;
/* Get aggregate statistics (Uncorrected sum-of-squares and sums) needed for HH index calculation */
proc summary data=&indsn chartype threads ;
by &byvars;
class &classvars;
vars &hhvars ;
output out=_need (label="Components needed for HH Index calculations")
N= SUM= USS= MAX=/ autoname ways ;
run;
/* Sort row, col totals to FOLLOW individual rows and columns, for later use by proc report */
proc sort data=_need; by &byvars descending _way_; run;
%let DSNLABEL=HHI indices for variable(s) &hhvars., classified by (&CLASSVARS.);
%if %length(&byvars)>0 %then %let DSNLABEL=&DSNLABEL., with byvars=&byvars.;
%let DSNLABEL=(label="&DSNLABEL.");
%put ********* &=DSNLABEL ********;
/* If no OUTDSN specified then just build a dataset VIEW, and request a report generation */
%if &outdsn=%str() %then %do;
%let OUTDSN=_VNEED ;
%let report=YES;
%end;
/* Calculate the HH indices = uncorrected sum-of-squares / square of the sum */
data &outdsn &DSNLABEL
%if &outdsn=_VNEED %then / view=_VNEED ;
;
set _need;
%do h=1 %to &n_hhvars;
%let hvar=%scan(&hhvars,&h,%str( ));
&hvar._HHI = &hvar._USS / &hvar._SUM**2;
label &hvar._HHI = "HHI for &HVAR" ;
%end;
run;
%if &report= YES %then %do;
proc format ; /* For use in PROC REPORT below */
value dotall .='Total';
value $blankall ' '='Total';
run;
/* Report layout depends on number of classvars */
proc report data=_LAST_ ;
title1 "Herfindahl-Hirschman Index for %scan(&hhvars,1)"
%if &n_hhvars=2 %then " and %scan(&HHVARS,2)";
%if %length(&byvars)>0 %then ", by &byvars (Page)";
;
%if %length(&classvars)>0 %then
title2 "Categorized by %scan(&classvars,1)" %if &n_class=2 %then " (row) and %scan(&classvars,2) (col)";;;
by &byvars;
%if &n_class=2 %then
column %scan(&classvars,1) (&hhinames),%scan(&classvars,2),sum ;
%else
column &classvars (&hhinames),sum;
;
%if &n_class=2 %then
define %scan(&classvars,2) / '' across order=data missing ;
;
%if &n_class>0 %then
define %scan(&classvars,1) / '' group order=data missing ;
;
%do h=1 %to &n_hhvars;
define %scan(&hhinames,&h,%str( )) /analysis format=7.3 ;
%end;
define sum / '';
format _numeric_ dotall32. ;
format _character_ $blankall. ;
run;
title1; /* Clear Titles */
%end;
%mend hhi_index;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @mkeintz
Many thanks for your dedicated code, I will have a look and may have some questions when reading the code. However, could you please have a look at my code above (along with some problems I am facing) and verify it in my case? I know my code is in a particular case and not comprehensive but sometimes it is simple enough and boosts my confidence before coming to your code (a little bit complicated at the first glance, lol).
Warmest regards.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is the core of the code I provided, applied to sashelp.prdsale:
proc summary data=SASHELP.prdsale;
class prodtype YEAR ;
var actual predict;
output out=SUMS sum= uss=/ autoname ways;
run;
data hhi_values;
set sums;
hhi_actual=actual_uss/(actual_sum**2);
hhi_predict=predict_uss/(predict_sum**2);
run;
Most of the code I supplied earlier was written to
- format the (optional) PROC REPORT
- make the analysis variables and their hhi derivatives into macro parameters.
Note the "autoname" option in proc summary is quite useful, since it establishes a naming pattern that concatenates the analysis variable names (ACTUAL and PREDICT here) with an underscore and the statistic name (SUM and USS here). So you know exactly what names to use in the subsequent data step. I.e. from ACTUAL, the above proc summary creates ACTUAL_SUM and ACTUAL_USS.
Also, proc summary generates the statistics not only for each cell identified by product/year, but also for each year for all products, each product for all years, and for all years & products. This allows you to see the HHI decomposition by category.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------