Dear Forum Members,
Thanks for taking time to read my request !!!
Attached are two tables I want to link.
I can see that the exact match may lead to very sparse matches. So the alternative is to use I-O industry at 4 digit level. E.g. I-O industry 1110 contains 10 different detailed I-O code, a coarse match would consider any of the following NAICS goes to 1110 industry and its sub-industry.
11111 | 11112 | ||||||
11113 | 11114 | 11115 | 11116 | 11119 | |||
1112 | |||||||
11131 | 11132 | 111331 | 111332 | 111333 | 111334 | 111336 | 111339 |
111335 | |||||||
1114 | |||||||
11191 | |||||||
11192 | |||||||
11193 | 111991 | ||||||
11194 | 111992 | 111998 |
Ideally I would like to first do an exact match, then see if I lose many observations, I will do the coarse match.
Could someone gives a sample code to do this?
Thanks,
Lan
Is there a rule that describes that 1111BO could be matched with 111150?
Besides that, I would build a format from the NAICS table, with an OTHER value which you could test on.
So, in the data step, first put for an exact match, if it's not a hit, put with a substr() on I-O code (4 pos, then might even 3 pos if there are still misss matches.
Thanks, LinusH. I-O tables shows that on row 1111BO, it corresponds to 111113, 111114, 11115, 11116, 111119, I use 111150 as a match to right censor the zero.
what do you mean by 3 pos?
The matching keys to NAICS already seem to exist in the I-O table. The exception seems to be at the I-O sector (no I-O code) and subsector level (4 digit I-O codes). But . . .
Setting the exceptions aside for now . . . .In your I-O table, all I-O codes below the sector and subsector level have one-to-many columns of matching NAICS codes. Note the NAICS codes in your I-O table are not necessarily at the lowest level of classification. (NAICS is broken down by sectors (2-digit), subsector (3-digit), industry groups (4-digit), industries (5-digit) and "national industries" (6-digit)). Each NAICS "code" in your I-O table can be matched to the corresponding NAICS codes, at the appropriate level of detail, in a table having a complete listing of NAICS codes.
But, your NAICS97 table does not have a complete enumeration of all NAICS codes.
For example, I-O industry 115000 (Agriculture and forestry support activities) shows that NAICS subsector 115 (Support Activities for Agriculture and Forestry) is related. (This implies that all 6-digit codes in NAICS subsector 115 are related to I-O industry 115000). But your NAICS97 table has no entries starting with 115.
Is NAICS97 deliberately incomplete (missing sector, subsector, industry group, industry or national industry)??
In any case, I do not think you can use the I-O code alone to match to NAICS, except very crudely at the 4-digit level. (Results would be inaccurate in many cases). You need to use the NAICS codes on your I-O table.
Thank you, Fugue!
my previous attached file NAICS97 contains only fiscal year 1997 data (at least according to the database I extract it from). Attached is a more complete NAICS list, it contains 115 as you suggested in your example.
I agree the exact match is not ideal. My understandings is I-O industry classification by Bureau of Economic Analysis has more detailed industry classifications than NAICS and SIC ,
suppose I want to match at 4 digit I-O industry level, how should I code this,
do I have to write many if statement, such as
i-o industry animal production 1130:
if NAICS="1131" OR "1132" OR "1133" THEN
NAICS_all table has the following entries
113
113110
113210
113310
which one should I match with which.
As an aside, I doubt that the I-O codes are more detailed than NAICS: the I-O codes are basically a collapsed version of NAICS.
To reiterate my previous point, I don't think you can use the I-O code to match to NAICS codes, except crudely.
For example, I-O industry 1111A0 Oilseed farming is related to NAICS codes 11111 and 11112 according to your I-O table. Based on this sample, you could assume that the first four digits of the I-O code match the first 4 digits of the NAICS code in the NAICS97 table.
But, I-O industry 1111B0 Grain farming also begins with 1111; however, it is related to different NAICS codes (11113, 11114, 11115, 11116, and 11119). Matching between the first four digits of the I-O code and the first four digits in the NAICS97 table in the above two examples would return erroneous matches.
Given that you are trying to match the I-O file and the NAICS97 file for whatever reason, then I suggest that you match the "related 1997 NAICS codes" from the I-O file to the codes in the NAICS97 file. A couple of different ways you could do this:
I strongly suggest you do not use the I-O codes to attempt exact matches to the NAICS table unless you are ok with fuzzy matches.
FYI -- your revised NAICS97 table is still incomplete: it does not have every possible NAICS code. However, that may be irrelevant depending on where you are pulling the data from and what you are trying to accomplish.
FYI -- here is a link to a document which contains the concordance between I-O industry and NAICS. Go to printed page 39 (pdf page 21).
http://www.bea.gov/scb/pdf/2002/12December/1202I-OAccounts2.pdf
This confirms my belief that the matching between your two files should be based on NAICS code.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.