BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sastuck
Pyrite | Level 9

Hey!

 

I am trying to create a categorical variable for different industries using NAICS code. Naics.com identifies 20 different industries:

 

11Agriculture, Forestry, Fishing and Hunting 
21Mining 
22Utilities 
23Construction 
31-33Manufacturing 
42Wholesale Trade 
44-45Retail Trade 
48-49Transportation and Warehousing 
51Information 
52Finance and Insurance 
53Real Estate Rental and Leasing 
54Professional, Scientific, and Technical Services 
55Management of Companies and Enterprises 
56Administrative and Support and Waste Management and Remediation Services 
61Educational Services 
62Health Care and Social Assistance 
71Arts, Entertainment, and Recreation 
72Accommodation and Food Services 
81Other Services (except Public Administration) 
92Public Administration

 

The data look just like the above codes--no extra zeros or anything. 

 

Without looking too scrupulously at the codes themselves, is this the right idea? 

 

*create categorical variable for industry;
data paper.ceo_firm3;
	set paper.ceo_firm2;
	if NAICS=11 then industry=outdoors;
	if NAICS=21 then industry=mining;
	if NAICS=22 then industry=utilities;
	if NAICS=23 then industry=construction;
	if NAICS=31 then industry=manufacturing;
	if NAICS=32 then industry=manufacturing;
	if NAICS=33 then industry=manufacturing;
	if NAICS=42 then industry=wholesale;
	if NAICS=44 then industry=retail;
	if NAICS=45 then industry=retail;
	if NAICS=48 then industry=transportation;
	if NAICS=49 then industry=transportation;
	if NAICS=51 then industry=information;
	if NAICS=52 then industry=finance;
	if NAICS=53 then industry=real_estate;
	if NAICS=54 then industry=professional; 
	if NAICS=55 then industry=management;
	if NAICS=56 then industry=waste;
	if NAICS=61 then industry=education;
	if NAICS=62 then industry=healthcare;
	if NAICS=71 then industry=arts;
	if NAICS=72 then industry=food;
	if NAICS=81 then industry=other;
	if NAICS=92 then industry=public_admin;
run;

Thanks for the feedback!

 

-SAStuck

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

No, your code is not syntactically correct.  To correct your code:

*create categorical variable for industry;
data paper.ceo_firm3;
  set paper.ceo_firm2;
  if NAICS=11 then industry="outdoors";
  if NAICS=21 then industry="mining";

However that is a bit of a faff, and then what if you want to sort per code order.  Another way of doing it is by formats:

 

View solution in original post

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

No, your code is not syntactically correct.  To correct your code:

*create categorical variable for industry;
data paper.ceo_firm3;
  set paper.ceo_firm2;
  if NAICS=11 then industry="outdoors";
  if NAICS=21 then industry="mining";

However that is a bit of a faff, and then what if you want to sort per code order.  Another way of doing it is by formats:

 

Reeza
Super User

Several websites (hint census) have this as a CSV file with the mapping’s down to the 4 & 6 digit NAICS. If you read in the CSV file and create a data set to pass to PROC FORMAT using CNTLIN you can avoid the manual typing out of all your codes. 

sastuck
Pyrite | Level 9

Thanks for the heads up @Reeza! How would I merge the census's data into my data set without any additional information on the company though, like TICKER or CONAME?

ballardw
Super User

@sastuck wrote:

Thanks for the heads up @Reeza! How would I merge the census's data into my data set without any additional information on the company though, like TICKER or CONAME?


Why would you need ticker or coname? You are asking about values of NAICS codes.

 

Here is how a custom format works:

proc format library=work;
value naics
11="outdoors"
21="mining"
22="utilities"
23="construction"
31="manufacturing"
32="manufacturing"
33="manufacturing"
42="wholesale"
;
run;

data junk;
   do x=11,21,22,23,33;
      output;
   end;
   format x naics.;
   label x ="Industry";
run;
proc print data=junk label;
var x;
run;

The source @Reeza mentioned would allow creating a pretty complete format.

 

sastuck
Pyrite | Level 9

i'm mentioning ticker and coname because in the end I want this data to be in my ceo_firm dataset. I'm not understanding how the code you shared would get the naics data into this new ceo_firm3 data set. Please enlighten me!

ballardw
Super User

@sastuck wrote:

i'm mentioning ticker and coname because in the end I want this data to be in my ceo_firm dataset. I'm not understanding how the code you shared would get the naics data into this new ceo_firm3 data set. Please enlighten me!


The information is already in your data: the NAICS code value. The format just displays it in a more human understandable way. There is often no need to create an additional variable if all the information you need is contained in an existing variable.

 

If you do not already have the NAICS code in your ceo_firm data set then that should have been your question.

sastuck
Pyrite | Level 9

It is in the dataset. I just want to use it to create the categorical variables

Reeza
Super User

Once you create a format you just need to apply it. You have two options:

1) Add a new variable, character, that contains the description.

2) Apply the format using PROC PRINT. 

 

A generic example:

 

proc format;
value age_group
low - 13 = 'Pre-Teen'
13 - 15 = 'Teen'
16 - high = 'Adult';
run;

title 'Example of an applied format';
proc print data=sashelp.class;
format age age_group.;
run;


data class;
set sashelp.class;
age_category = put(age, age_group.);
label age_category = 'Age Category';
run;

title 'Example of creating a new variable with the format';
proc print data=class;
run;

@sastuck wrote:

It is in the dataset. I just want to use it to create the categorical variables


 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1246 views
  • 0 likes
  • 4 in conversation