Hey!
I am trying to create a categorical variable for different industries using NAICS code. Naics.com identifies 20 different industries:
The data look just like the above codes--no extra zeros or anything.
Without looking too scrupulously at the codes themselves, is this the right idea?
*create categorical variable for industry;
data paper.ceo_firm3;
set paper.ceo_firm2;
if NAICS=11 then industry=outdoors;
if NAICS=21 then industry=mining;
if NAICS=22 then industry=utilities;
if NAICS=23 then industry=construction;
if NAICS=31 then industry=manufacturing;
if NAICS=32 then industry=manufacturing;
if NAICS=33 then industry=manufacturing;
if NAICS=42 then industry=wholesale;
if NAICS=44 then industry=retail;
if NAICS=45 then industry=retail;
if NAICS=48 then industry=transportation;
if NAICS=49 then industry=transportation;
if NAICS=51 then industry=information;
if NAICS=52 then industry=finance;
if NAICS=53 then industry=real_estate;
if NAICS=54 then industry=professional;
if NAICS=55 then industry=management;
if NAICS=56 then industry=waste;
if NAICS=61 then industry=education;
if NAICS=62 then industry=healthcare;
if NAICS=71 then industry=arts;
if NAICS=72 then industry=food;
if NAICS=81 then industry=other;
if NAICS=92 then industry=public_admin;
run;
Thanks for the feedback!
-SAStuck
No, your code is not syntactically correct. To correct your code:
*create categorical variable for industry;
data paper.ceo_firm3;
set paper.ceo_firm2;
if NAICS=11 then industry="outdoors";
if NAICS=21 then industry="mining";
However that is a bit of a faff, and then what if you want to sort per code order. Another way of doing it is by formats:
No, your code is not syntactically correct. To correct your code:
*create categorical variable for industry;
data paper.ceo_firm3;
set paper.ceo_firm2;
if NAICS=11 then industry="outdoors";
if NAICS=21 then industry="mining";
However that is a bit of a faff, and then what if you want to sort per code order. Another way of doing it is by formats:
Several websites (hint census) have this as a CSV file with the mapping’s down to the 4 & 6 digit NAICS. If you read in the CSV file and create a data set to pass to PROC FORMAT using CNTLIN you can avoid the manual typing out of all your codes.
Thanks for the heads up @Reeza! How would I merge the census's data into my data set without any additional information on the company though, like TICKER or CONAME?
@sastuck wrote:
Thanks for the heads up @Reeza! How would I merge the census's data into my data set without any additional information on the company though, like TICKER or CONAME?
Why would you need ticker or coname? You are asking about values of NAICS codes.
Here is how a custom format works:
proc format library=work; value naics 11="outdoors" 21="mining" 22="utilities" 23="construction" 31="manufacturing" 32="manufacturing" 33="manufacturing" 42="wholesale" ; run; data junk; do x=11,21,22,23,33; output; end; format x naics.; label x ="Industry"; run; proc print data=junk label; var x; run;
The source @Reeza mentioned would allow creating a pretty complete format.
i'm mentioning ticker and coname because in the end I want this data to be in my ceo_firm dataset. I'm not understanding how the code you shared would get the naics data into this new ceo_firm3 data set. Please enlighten me!
@sastuck wrote:
i'm mentioning ticker and coname because in the end I want this data to be in my ceo_firm dataset. I'm not understanding how the code you shared would get the naics data into this new ceo_firm3 data set. Please enlighten me!
The information is already in your data: the NAICS code value. The format just displays it in a more human understandable way. There is often no need to create an additional variable if all the information you need is contained in an existing variable.
If you do not already have the NAICS code in your ceo_firm data set then that should have been your question.
It is in the dataset. I just want to use it to create the categorical variables
Once you create a format you just need to apply it. You have two options:
1) Add a new variable, character, that contains the description.
2) Apply the format using PROC PRINT.
A generic example:
proc format;
value age_group
low - 13 = 'Pre-Teen'
13 - 15 = 'Teen'
16 - high = 'Adult';
run;
title 'Example of an applied format';
proc print data=sashelp.class;
format age age_group.;
run;
data class;
set sashelp.class;
age_category = put(age, age_group.);
label age_category = 'Age Category';
run;
title 'Example of creating a new variable with the format';
proc print data=class;
run;
@sastuck wrote:
It is in the dataset. I just want to use it to create the categorical variables
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.