Hi SAS Users,
Today I try to perform the HHI index calculation, I also read some documents from other topics (Topic 1, Topic 2) in this forum about HHI calculation but I am still not able to apply these solutions to my case, could you please give me a hint to sort it out?
I also attach the dataset "canus_sample" under sas7bdat type below, a quick description is
TYPE INDC3 GEOGN YEAR wACC_PAY wMAR_TO_BOO wNET_SAL wRETURN wSAL_GRO
719598 AUTMB CANADA 1988 11027 1.5294117647 94676 . .
912664 AUTMB CANADA 1988 . 0.8048157351 13331775 . .
922180 AUTMB CANADA 1988 . 1.4089254416 476973 . .
933574 AUTMB CANADA 1988 . 1.0194213877 212008 . .
982028 AUTMB CANADA 1988 . 0.6533412888 1219683 . .
510135 AUTMB UNITEDS 1988 5377 1.4341150666 94403 . .
729665 AUTMB UNITEDS 1988 . 0.9192307692 757378 . .
740787 AUTMB UNITEDS 1988 . 0.9102109965 259931 . .
719598 AUTMB CANADA 1989 11305 1.1956521739 98336 -0.153846154 3.8658160463
912664 AUTMB CANADA 1989 992489 0.9512786263 13241091 0.368846756 -0.680209499
922180 AUTMB CANADA 1989 22463 1.5847339821 427928 0.2 -10.28255268
Among them, "TYPE', INDC3", "GEOGN", and "YEAR" are character variables while the rest are numeric variables.
INDC3: an indicator of industry
Type: an indicator of firms
So, I want to calculate the HHI index for each industry for each year for the whole dataset (regardless of the countries (that's why the data has been sorted by INDC3 and year already).
How to calculate the HHI index:
In each industry (INDC3), we sum up all wNET_SAL for each year. Afterward, we need to divide the wNET_SAL of each firm in this industry this year for the total wNET_SAL of this industry of this year, we call this result is wNET_SAL_OVE_IND. Lastly, we sum up the square of wNET_SAL_OVE_IND of all firms in this industry in this year and it is the HHI for this industry in this year.
For example, I am going to calculate HHI for industry "AUTMB" in year 1989
TYPE INDC3 GEOGN YEAR wACC_PAY wMAR_TO_BOO wNET_SAL
719598 AUTMB CANADA 1989 11305 1.1956521739 98336
912664 AUTMB CANADA 1989 992489 0.9512786263 13241091
922180 AUTMB CANADA 1989 22463 1.5847339821 427928
933574 AUTMB CANADA 1989 20732 1.0124768462 249152
982028 AUTMB CANADA 1989 217887 0.4379157428 1663509
510135 AUTMB UNITEDS 1989 9253 2.2179974651 136135
729665 AUTMB UNITEDS 1989 40095 1.4135472371 790967
740787 AUTMB UNITEDS 1989 . 1.6622800942 277114
First, I sum up all wNET_SAL of all firms in industry AUTMB in 1989 => sum_wNET_SAL= 98336+ 13241091+...+ 277114 =16884232
Then I divide the wNET_SAL of each firm in this industry this year for the total wNET_SAL of this industry of this year, we call this result is wNET_SAL_OVE_IND:
TYPE wNET_SAL_OVE_IND
719598 0.00582
912664 0.78423
922180 0.0253
933574 0.01476
982028 0.0985
510135 0.0081
729665 0.0468
740787 0.01641
Then we square wNET_SAL_OVE_IND for all firms in this industry in this year then sum up together to have the HHI for this industry in this year. So HHI of the industry AUTMB in the year 1989 equal to 0.00582^2 + 0.07842^2+.....+0.01641^2
This is the list of INDC3 :
INDC3
AUTMB
BRESR
CHMCL
CNSTM
CONPS
ENEGY
FDBEV
HLTHC
INDGS
MEDIA
NA
PCDGS
RTAIL
TECNO
TELCM
TRLES
UNCLS
UQEQS
EDIT: I just code it and I have the code like that
* calculate market size by industry , year;
proc sql;
create table work.herfindahl1 as
select Type, year, INDC3, wNET_SAL, sum(wNET_SAL) as market_size, count(INDC3) as firm_num
from canus_sample
group by INDC3,year
;
quit;
*get the market share squares;
data work.herfindahl2;
set work.herfindahl1;
if market_size = 0 or market_size = . then delete;
market_share_sqr = (wNET_SAL/market_size)**2;
run;
*calculate the herfindahl index;
proc sql;
create table work.herfindahl3 as
select Type, year, INDC3, wNET_SAL, firm_num, market_size, market_share_sqr, sum(market_share_sqr) as Herfindahl
from work.herfindahl2
group by year, INDC3
;
quit;
proc sql;
create table work.herfindahl3 as
select year, INDC3, sum(market_share_sqr) as Herfindahl
from work.herfindahl2
group by year, INDC3
;
quit;
And the result is
There are 3 questions here:
1. I am just wondering if there is any hidden error that I did not notice with the code above
2. Apart from that, regarding the code
group by INDC3,year
in PROC SQL, so if we use it, whether we need to proc sort the initial data by INDC3 and year? From this document, it says that I do not need to Proc Sort before doing SQL, I am just prudent about my code to make sure I do not make mistakes at this stage.
3. If the code above is correct, what should I do to set Herfindahl =. if Herfindahl=1 or INDC3 in( 'UNCLS', 'UQEQS', 'NA' )
For example, these observation should receive value '.'
Many thanks and warmest regards.
... View more