DATA Step, Macro, Functions and more

link two tables - not exact match

Reply
Frequent Contributor
Posts: 102

link two tables - not exact match

Dear Forum Members,

Thanks for taking time to read my request !!!

Attached are two tables I want to link.

  1. I-O table contains three types of variables: (1) I-O industry,  (2) industry name (3) 1997 NAICS code
  2. NAICS97 table contains 1  variable only : NAICS

  1. My objective is to match I-O industry with NAICS in NAICS97 table

  1. e.g. I-O code 1111B0 corresponds to 11115, and in NAICS97 table, we have 111150 , so I would consider that a match, i.e. 1111B0 matches with 111150

I can see that the exact match may lead to very sparse matches. So the alternative is to use I-O industry at 4 digit level. E.g. I-O industry 1110 contains 10 different detailed I-O code, a coarse match would consider any of the following NAICS goes to 1110 industry and its sub-industry.

11111

11112

11113

11114

11115

11116

11119

1112

11131

11132

111331

111332

111333

111334

111336

111339

111335

1114

11191

11192

11193

111991

11194

111992

111998

Ideally I would like to first do an exact match, then see if I lose many observations, I will do the coarse match.

Could someone gives a sample code to do this?

Thanks,

Lan

Attachment
Attachment
Super User
Posts: 5,433

Re: link two tables - not exact match

Is there a rule that describes that 1111BO could be matched with 111150?

Besides that, I would build a format from the NAICS table, with an OTHER value which you could test on.

So, in the data step, first put for an exact match, if it's not a hit, put with a substr() on I-O code (4 pos, then might even 3 pos if there are still misss matches.

Data never sleeps
Frequent Contributor
Posts: 102

Re: link two tables - not exact match

Thanks, LinusH. I-O tables shows that on row 1111BO, it corresponds to 111113, 111114, 11115, 11116, 111119, I use 111150 as a match to right censor the zero.

what do you mean by 3 pos?

Super Contributor
Posts: 307

Re: link two tables - not exact match

The matching keys to NAICS already seem to exist in the I-O table. The exception seems to be at the I-O sector (no I-O code) and subsector level (4 digit I-O codes). But . . .

Setting the exceptions aside for now . . . .In your I-O table, all I-O codes below the sector and subsector level have one-to-many columns of matching NAICS codes. Note the NAICS codes in your I-O table are not necessarily at the lowest level of classification. (NAICS is broken down by sectors (2-digit), subsector (3-digit), industry groups (4-digit), industries (5-digit) and "national industries" (6-digit)). Each NAICS "code" in your I-O table can be matched to the corresponding NAICS codes, at the appropriate level of detail, in a table having a complete listing of NAICS codes.

But, your NAICS97 table does not have a complete enumeration of all NAICS codes.

For example, I-O industry 115000 (Agriculture and forestry support activities) shows that NAICS subsector 115 (Support Activities for Agriculture and Forestry) is related. (This implies that all 6-digit codes in NAICS subsector 115 are related to I-O industry 115000). But your NAICS97 table has no entries starting with 115.

Is NAICS97 deliberately incomplete (missing sector, subsector, industry group, industry or national industry)??

 

In any case, I do not think you can use the I-O code alone to match to NAICS, except very crudely at the 4-digit level. (Results would be inaccurate in many cases). You need to use the NAICS codes on your I-O table.

Frequent Contributor
Posts: 102

Re: link two tables - not exact match

Thank you, Fugue!

my previous attached file NAICS97 contains only fiscal year 1997 data (at least according to the database I extract it from). Attached is a more complete NAICS list, it contains 115 as you suggested in your example.

I agree the exact match is not ideal. My understandings is I-O industry classification by Bureau of Economic Analysis has more detailed industry classifications than NAICS and SIC ,

suppose I want to match at 4 digit I-O industry level, how should I code this,

do I have to write many if statement, such as

i-o industry animal production 1130:

if NAICS="1131" OR "1132" OR "1133" THEN

NAICS_all table has the following entries

113

113110

113210

113310

which one should I match with which.

Attachment
Super Contributor
Posts: 307

Re: link two tables - not exact match

As an aside, I doubt that the I-O codes are more detailed than NAICS: the I-O codes are basically a collapsed version of NAICS.

To reiterate my previous point, I don't think you can use the I-O code to match to NAICS codes, except crudely.

For example, I-O industry 1111A0 Oilseed farming is related to NAICS codes 11111 and 11112 according to your I-O table. Based on this sample, you could assume that the first four digits of the I-O code match the first 4 digits of the NAICS code in the NAICS97 table.

But, I-O industry 1111B0 Grain farming also begins with 1111; however, it is related to different NAICS codes (11113, 11114, 11115, 11116, and 11119). Matching between the first four digits of the I-O code and the first four digits in the NAICS97 table in the above two examples would return erroneous matches.

Given that you are trying to match the I-O file and the NAICS97 file for whatever reason, then I suggest that you match the "related 1997 NAICS codes" from the I-O file to the codes in the NAICS97 file. A couple of different ways you could do this:

  • split the I-O data so that you have one row for every "related 1997 NAICS code". For example, you could split the row for 1111A0 Oilseed farming into two rows: one for NAICS 11111 and one for 11112. Then you can match NAICS to NAICS.
  • alternatively, read the "related 1997 NAICS codes" in the I-O table into a dataset with separate columns for each related NAICS code. Then, match NAICS97 based on an "IN" comparison to that range of columns.

I strongly suggest you do not use the I-O codes to attempt exact matches to the NAICS table unless you are ok with fuzzy matches.

Super Contributor
Posts: 307

Re: link two tables - not exact match

FYI -- your revised NAICS97 table is still incomplete: it does not have every possible NAICS code. However, that may be irrelevant depending on where you are pulling the data from and what you are trying to accomplish.

Super Contributor
Posts: 307

Re: link two tables - not exact match

FYI -- here is a link to a document which contains the concordance between I-O industry and NAICS. Go to printed page 39 (pdf page 21).

http://www.bea.gov/scb/pdf/2002/12December/1202I-OAccounts2.pdf

This confirms my belief that the matching between your two files should be based on NAICS code.

Ask a Question
Discussion stats
  • 7 replies
  • 498 views
  • 0 likes
  • 3 in conversation