BookmarkSubscribeRSS Feed
LanMin
Fluorite | Level 6

Dear Forum Members,

Thanks for taking time to read my request !!!

Attached are two tables I want to link.

  1. I-O table contains three types of variables: (1) I-O industry,  (2) industry name (3) 1997 NAICS code
  2. NAICS97 table contains 1  variable only : NAICS

  1. My objective is to match I-O industry with NAICS in NAICS97 table

  1. e.g. I-O code 1111B0 corresponds to 11115, and in NAICS97 table, we have 111150 , so I would consider that a match, i.e. 1111B0 matches with 111150

I can see that the exact match may lead to very sparse matches. So the alternative is to use I-O industry at 4 digit level. E.g. I-O industry 1110 contains 10 different detailed I-O code, a coarse match would consider any of the following NAICS goes to 1110 industry and its sub-industry.

11111

11112

11113

11114

11115

11116

11119

1112

11131

11132

111331

111332

111333

111334

111336

111339

111335

1114

11191

11192

11193

111991

11194

111992

111998

Ideally I would like to first do an exact match, then see if I lose many observations, I will do the coarse match.

Could someone gives a sample code to do this?

Thanks,

Lan

7 REPLIES 7
LinusH
Tourmaline | Level 20

Is there a rule that describes that 1111BO could be matched with 111150?

Besides that, I would build a format from the NAICS table, with an OTHER value which you could test on.

So, in the data step, first put for an exact match, if it's not a hit, put with a substr() on I-O code (4 pos, then might even 3 pos if there are still misss matches.

Data never sleeps
LanMin
Fluorite | Level 6

Thanks, LinusH. I-O tables shows that on row 1111BO, it corresponds to 111113, 111114, 11115, 11116, 111119, I use 111150 as a match to right censor the zero.

what do you mean by 3 pos?

Fugue
Quartz | Level 8

The matching keys to NAICS already seem to exist in the I-O table. The exception seems to be at the I-O sector (no I-O code) and subsector level (4 digit I-O codes). But . . .

Setting the exceptions aside for now . . . .In your I-O table, all I-O codes below the sector and subsector level have one-to-many columns of matching NAICS codes. Note the NAICS codes in your I-O table are not necessarily at the lowest level of classification. (NAICS is broken down by sectors (2-digit), subsector (3-digit), industry groups (4-digit), industries (5-digit) and "national industries" (6-digit)). Each NAICS "code" in your I-O table can be matched to the corresponding NAICS codes, at the appropriate level of detail, in a table having a complete listing of NAICS codes.

But, your NAICS97 table does not have a complete enumeration of all NAICS codes.

For example, I-O industry 115000 (Agriculture and forestry support activities) shows that NAICS subsector 115 (Support Activities for Agriculture and Forestry) is related. (This implies that all 6-digit codes in NAICS subsector 115 are related to I-O industry 115000). But your NAICS97 table has no entries starting with 115.

Is NAICS97 deliberately incomplete (missing sector, subsector, industry group, industry or national industry)??

 

In any case, I do not think you can use the I-O code alone to match to NAICS, except very crudely at the 4-digit level. (Results would be inaccurate in many cases). You need to use the NAICS codes on your I-O table.

LanMin
Fluorite | Level 6

Thank you, Fugue!

my previous attached file NAICS97 contains only fiscal year 1997 data (at least according to the database I extract it from). Attached is a more complete NAICS list, it contains 115 as you suggested in your example.

I agree the exact match is not ideal. My understandings is I-O industry classification by Bureau of Economic Analysis has more detailed industry classifications than NAICS and SIC ,

suppose I want to match at 4 digit I-O industry level, how should I code this,

do I have to write many if statement, such as

i-o industry animal production 1130:

if NAICS="1131" OR "1132" OR "1133" THEN

NAICS_all table has the following entries

113

113110

113210

113310

which one should I match with which.

Fugue
Quartz | Level 8

As an aside, I doubt that the I-O codes are more detailed than NAICS: the I-O codes are basically a collapsed version of NAICS.

To reiterate my previous point, I don't think you can use the I-O code to match to NAICS codes, except crudely.

For example, I-O industry 1111A0 Oilseed farming is related to NAICS codes 11111 and 11112 according to your I-O table. Based on this sample, you could assume that the first four digits of the I-O code match the first 4 digits of the NAICS code in the NAICS97 table.

But, I-O industry 1111B0 Grain farming also begins with 1111; however, it is related to different NAICS codes (11113, 11114, 11115, 11116, and 11119). Matching between the first four digits of the I-O code and the first four digits in the NAICS97 table in the above two examples would return erroneous matches.

Given that you are trying to match the I-O file and the NAICS97 file for whatever reason, then I suggest that you match the "related 1997 NAICS codes" from the I-O file to the codes in the NAICS97 file. A couple of different ways you could do this:

  • split the I-O data so that you have one row for every "related 1997 NAICS code". For example, you could split the row for 1111A0 Oilseed farming into two rows: one for NAICS 11111 and one for 11112. Then you can match NAICS to NAICS.
  • alternatively, read the "related 1997 NAICS codes" in the I-O table into a dataset with separate columns for each related NAICS code. Then, match NAICS97 based on an "IN" comparison to that range of columns.

I strongly suggest you do not use the I-O codes to attempt exact matches to the NAICS table unless you are ok with fuzzy matches.

Fugue
Quartz | Level 8

FYI -- your revised NAICS97 table is still incomplete: it does not have every possible NAICS code. However, that may be irrelevant depending on where you are pulling the data from and what you are trying to accomplish.

Fugue
Quartz | Level 8

FYI -- here is a link to a document which contains the concordance between I-O industry and NAICS. Go to printed page 39 (pdf page 21).

http://www.bea.gov/scb/pdf/2002/12December/1202I-OAccounts2.pdf

This confirms my belief that the matching between your two files should be based on NAICS code.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1436 views
  • 0 likes
  • 3 in conversation