************************************************************************************************************************;
* SAS MACRO CODE FOR INDUSTRY CLASSIFICATION INTO 15 INDUSTRIES (prepared by Yaniv Konchitchki, U.C. Berkeley).
The code is based on Konchitchki, Yaniv. 2011."Inflation and Nominal Financial Reporting:
Implications for Performance and Stock Prices.” The Accounting Review 86 (3), 1045–1085.
This classification is also used in Barth, Mary E., Yaniv Konchitchki, Wayne R. Landsman. 2013. “Cost of
Capital and Earnings Transparency.” Journal of Accounting and Economics 55 (2-3), 206–224.
Please cite if using it. This code creates a dataset "dataout" by adding to any existing dataset "datain" the
15-industry classification based on "varForSICcode". The new dataset adds to your dataset an industry variable
denoted as specified in "varForIndDescOutput".
A suggested use:
%Add15KonchitchkiIndustries(datain=cstat, dataout=cstat, varForSICcode=sic, varForIndDescOutput=ind)
Note: The variable SIC is directly from Compustat (used to be DNUM in the historical Compustat legacy format);
%macro Add15KonchitchkiIndustries(datain=,dataout=,varForSICcode=,varForIndDescOutput=);
data &dataout;
set &datain;
varForSICcodeNum = &varForSICcode*1;* Ensure that the input industry classification is a numeric variable;
if (1000<=varForSICcodeNum< 1300) or (1399< varForSICcodeNum<=1999) then &varForIndDescOutput = "Mining, constructi.";
else if (2000<=varForSICcodeNum<=2111) then &varForIndDescOutput = "Food";
else if (2200<=varForSICcodeNum<=2799) then &varForIndDescOutput = "Textiles, printing.";
else if (2800<=varForSICcodeNum<=2824) or (2840<=varForSICcodeNum<=2899) then &varForIndDescOutput = "Chemicals";
else if (2830<=varForSICcodeNum<=2836) then &varForIndDescOutput = "Pharmaceuticals";
else if (2900<=varForSICcodeNum<=2999) or (1300<=varForSICcodeNum<=1399) then &varForIndDescOutput = "Extractive Industr.";
else if (3000<=varForSICcodeNum< 3570) or (3579< varForSICcodeNum< 3670)
or (3679< varForSICcodeNum<=3999) then &varForIndDescOutput = "Durable";
else if (7370<=varForSICcodeNum<=7379) or (3570<=varForSICcodeNum<=3579)
or (3670<=varForSICcodeNum<=3679) then &varForIndDescOutput = "Computers";
else if (4000<=varForSICcodeNum<=4899) then &varForIndDescOutput = "Transportation";
else if (4900<=varForSICcodeNum<=4999) then &varForIndDescOutput = "Utilities";
else if (5000<=varForSICcodeNum<=5999) then &varForIndDescOutput = "Retail";
else if (6000<=varForSICcodeNum<=6411) then &varForIndDescOutput = "Financial Institut.";
else if (6500<=varForSICcodeNum<=6999) then &varForIndDescOutput = "Insurance, real es.";
else if (7000<=varForSICcodeNum< 7370) or (7379< varForSICcodeNum<=8999) then &varForIndDescOutput = "Services";
else if (9000<=varForSICcodeNum) then &varForIndDescOutput = "Other";
else &varForIndDescOutput = "Other";
drop varForSICcodeNum;
run;
%mend Add15KonchitchkiIndustries;
************************************************************************************************************************;
Or use a format such as:
proc format library=work;
value SICCode
0001 - 1300, 1399 - 1999 = "Mining, constructi."
2000 - 2111 = "Food"
2200 - 2799 = "Textiles, printing."
2800 - 2824, 2840 - 2899 = "Chemicals"
2830 - 2836 = "Pharmaceuticals"
2900 - 2999, 1300 - 1399 = "Extractive Industr."
3000 - 3570, 3579 <-< 3670, 3679 <- 3999 = "Durable"
7370 - 7379, 3570 - 3579, 3670 - 3679 = "Computers"
4000 - 4899 = "Transportation"
4900 - 4999 = "Utilities"
5000 - 5999 = "Retail"
6000 - 6411 = "Financial Institut."
6500 - 6999 = "Insurance, real es."
7000 -< 7370, 7379 <- 8999 = "Services"
9000 <- HIGH = "Other"
other = "Other";
run;
Assign the format to the SIC variable in analysis or print.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.