Dear Forum Members,
Thanks for taking time to read my request !!!
Attached are two tables I want to link.
I can see that the exact match may lead to very sparse matches. So the alternative is to use I-O industry at 4 digit level. E.g. I-O industry 1110 contains 10 different detailed I-O code, a coarse match would consider any of the following NAICS goes to 1110 industry and its sub-industry.
11111 | 11112 | ||||||
11113 | 11114 | 11115 | 11116 | 11119 | |||
1112 | |||||||
11131 | 11132 | 111331 | 111332 | 111333 | 111334 | 111336 | 111339 |
111335 | |||||||
1114 | |||||||
11191 | |||||||
11192 | |||||||
11193 | 111991 | ||||||
11194 | 111992 | 111998 |
Ideally I would like to first do an exact match, then see if I lose many observations, I will do the coarse match.
Could someone gives a sample code to do this?
Thanks,
Lan
Is there a rule that describes that 1111BO could be matched with 111150?
Besides that, I would build a format from the NAICS table, with an OTHER value which you could test on.
So, in the data step, first put for an exact match, if it's not a hit, put with a substr() on I-O code (4 pos, then might even 3 pos if there are still misss matches.
Thanks, LinusH. I-O tables shows that on row 1111BO, it corresponds to 111113, 111114, 11115, 11116, 111119, I use 111150 as a match to right censor the zero.
what do you mean by 3 pos?
The matching keys to NAICS already seem to exist in the I-O table. The exception seems to be at the I-O sector (no I-O code) and subsector level (4 digit I-O codes). But . . .
Setting the exceptions aside for now . . . .In your I-O table, all I-O codes below the sector and subsector level have one-to-many columns of matching NAICS codes. Note the NAICS codes in your I-O table are not necessarily at the lowest level of classification. (NAICS is broken down by sectors (2-digit), subsector (3-digit), industry groups (4-digit), industries (5-digit) and "national industries" (6-digit)). Each NAICS "code" in your I-O table can be matched to the corresponding NAICS codes, at the appropriate level of detail, in a table having a complete listing of NAICS codes.
But, your NAICS97 table does not have a complete enumeration of all NAICS codes.
For example, I-O industry 115000 (Agriculture and forestry support activities) shows that NAICS subsector 115 (Support Activities for Agriculture and Forestry) is related. (This implies that all 6-digit codes in NAICS subsector 115 are related to I-O industry 115000). But your NAICS97 table has no entries starting with 115.
Is NAICS97 deliberately incomplete (missing sector, subsector, industry group, industry or national industry)??
In any case, I do not think you can use the I-O code alone to match to NAICS, except very crudely at the 4-digit level. (Results would be inaccurate in many cases). You need to use the NAICS codes on your I-O table.
Thank you, Fugue!
my previous attached file NAICS97 contains only fiscal year 1997 data (at least according to the database I extract it from). Attached is a more complete NAICS list, it contains 115 as you suggested in your example.
I agree the exact match is not ideal. My understandings is I-O industry classification by Bureau of Economic Analysis has more detailed industry classifications than NAICS and SIC ,
suppose I want to match at 4 digit I-O industry level, how should I code this,
do I have to write many if statement, such as
i-o industry animal production 1130:
if NAICS="1131" OR "1132" OR "1133" THEN
NAICS_all table has the following entries
113
113110
113210
113310
which one should I match with which.
As an aside, I doubt that the I-O codes are more detailed than NAICS: the I-O codes are basically a collapsed version of NAICS.
To reiterate my previous point, I don't think you can use the I-O code to match to NAICS codes, except crudely.
For example, I-O industry 1111A0 Oilseed farming is related to NAICS codes 11111 and 11112 according to your I-O table. Based on this sample, you could assume that the first four digits of the I-O code match the first 4 digits of the NAICS code in the NAICS97 table.
But, I-O industry 1111B0 Grain farming also begins with 1111; however, it is related to different NAICS codes (11113, 11114, 11115, 11116, and 11119). Matching between the first four digits of the I-O code and the first four digits in the NAICS97 table in the above two examples would return erroneous matches.
Given that you are trying to match the I-O file and the NAICS97 file for whatever reason, then I suggest that you match the "related 1997 NAICS codes" from the I-O file to the codes in the NAICS97 file. A couple of different ways you could do this:
I strongly suggest you do not use the I-O codes to attempt exact matches to the NAICS table unless you are ok with fuzzy matches.
FYI -- your revised NAICS97 table is still incomplete: it does not have every possible NAICS code. However, that may be irrelevant depending on where you are pulling the data from and what you are trying to accomplish.
FYI -- here is a link to a document which contains the concordance between I-O industry and NAICS. Go to printed page 39 (pdf page 21).
http://www.bea.gov/scb/pdf/2002/12December/1202I-OAccounts2.pdf
This confirms my belief that the matching between your two files should be based on NAICS code.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.