I'm using a database which contains ICD-9 codes but in a format without decimal places. I'm trying to import a list of ICD-9 codes that does contain decimal places and match it up with the one that doesn't. However I keep running into a problem of ambiguous dx codes in my database that could refer to two ICD-9s (one procedure and one diagnosis. This is a well known and widely used database so I feel like I must be overlooking something obvious.
IN MY IMPORTED LIST: normal ICD-9 codes with appropriate leading zeros and decimal.
IN MY DATABASE: everything is left justified and decimals are implied at certain places in the string. Leading zeros are inserted before the number to ensure that the decimal place is implied at the correct location.
However, the rules differ for ICD-9 procedure codes and ICD-9 diagnosis codes (and as a result a single code in my database could refer to a Dx AND a Proc)
As far as I can tell, this still leaves ambiguity since ICD procedure and diagnosis codes can take on the same value in the database. Table 1 (pasted below and also attached) breaks down the many ways “14” can be coded in the data. A number of them result in ambiguous codes (outlined in RED) in my database.
Since not all of these “14” codes actually exist, I provided examples of violations in the far right column.
Table 1. Ambiguous code combinations
PROC vs. DX | ICD-9 CODE | MarketScan code | Examples of ICD-9 codes actually in use | What happens if we just drop the decimal in SAS? |
PROC |
| 0014 | 0014 | |
PROC |
| 014 |
| 014 |
DX | 014 | 014 | 075 | 014 |
PROC |
| 0140 |
| 0140 |
DX |
| 0140 |
| 0140 |
DX |
| 0014 | 0014 | |
DX |
| 00140 | 00140 | |
DX |
| 01400 | 01400 | |
PROC | 14 | 14 | 14 | |
PROC |
| 140 |
| 140 |
DX | 140 | 140 | 140 | 140 |
PROC |
| 1400 |
| 1400 |
DX |
| 1400 |
| 1400 |
DX |
| 14000 | 14000 |
Thank you for any help/advice/guidance in advance. Don't know what I'd do with out you guys...
I do not understand why you are having trouble.
Normally your patient data would have had DX codes and PROC codes collected into different variables.
Normally your ICD9 look-up table would separate tables for DX and PROC codes (or at least another variable to indicate which type it is).
Is it possible that someone has stored the values without periods into a numeric variable instead of a character variable? (perhaps by accidentally running it through Excel instead of a real database) If so that will mess up any code that starts with zero.
JUST NOTICED THAT THE TABLE PASTED WRONG INTO THE SAS DISCUSSION FORUM. PLEASE REFER TO ATTACHED PDF INSTEAD!
THANKS AGAIN.
I don't have any subject matter expertise here but on a general level from what you describe and show us: Yes, removing the dot leads to ambiguity so in order to match the two data sources you need an additional field giving you the information whether a code is "DX" or "PROC".
I do not understand why you are having trouble.
Normally your patient data would have had DX codes and PROC codes collected into different variables.
Normally your ICD9 look-up table would separate tables for DX and PROC codes (or at least another variable to indicate which type it is).
Is it possible that someone has stored the values without periods into a numeric variable instead of a character variable? (perhaps by accidentally running it through Excel instead of a real database) If so that will mess up any code that starts with zero.
I do have separate fields for procedures that are identified using CPT codes. However, ICD-9 consists of both ICD-9 diagnosis codes and ICD-9 procedure codes. So as a result, the "diagnosis" field in my data can actually contain codes that refer to procedures, not diagnoses. I see what you mean about Excel but the coding scheme is designed by the compilers of the data set, not the result of accidentally changing the formatting.
Sorry for the time you've all spent thinking about my problem. I'll get to the bottom of this in the next few days and will be sure to post what I learn on this discussion thread.
Hi
I'm curious about your question since it would appear that we have all of the components that we need to accurately record each of the values. We have a field which tells us whether this is a diagnosis or procedure, we have the rule indicating whether or not we should position the decimal after the third or second character.
I assume you are trying to match the icd-9 column with the market scan value? why not drop the decimal in your transform then use the positional logic based on proc vs. dx to recode the variable?
If i missed something critical i apologize.
Happy to work up an example if that is really what you need.
-greg
Thanks for your response. You surmise correctly. Each claim I have refers to a single procedure/service with a number of associated diagnosis codes attached. As far as I know, I do not have a field which tells me whether any specific diagnosis field (ICD-9) is describing a procedure or a diagnosis and so do not have the information to determine when to apply my two digit decimal rule vs. my three digit decimal rule.
Got it, i saw your table with the first column and thot that was the field you could use. it would be pretty tough without knowing whether it was a dx or not. In the market scan data, we usually have dx1 -- dx15 and similar fields for procs. I would go back to the data owner and see if there is another option to understanding logic that you should use as others have suggested.
-greg
If you are really getting data from MarketScan then the DX fields should have diagnosis codes and the PROC fields will have procedure codes.
My apologies to all. I have finally figured this out and realize many of you were giving me the right answer all along but I didn't know enough to recognize it. As was pointed out (repeatedly) procedure codes go in the procedure field, diagnosis codes in the diagnosis field. I assumed all ICD-9 codes appeared in the same field. Rather, my diagnosis field can only hold ICD-9 diagnosis codes. My procedure field can contain ICD-9 procedure codes, CPT codes, or HCPCS codes. I have an additional variable which refers to which type of code is being used in the procedure field. Because CPT and HCPCS codes always have 5 digits and ICD-9 procedure codes always have four, the additional indicator variable isn't even required to differentiate between the two.
Thanks to everyone who was willing to bear with me on this one. You guys are amazing.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.