I have ICD9 codes in character data type. I would like to create new numeric variables based on the values in the ICD codes variables. I do not want to be typing every codes there is, and I don't know them all.
ICD1 codes = 462, 464, 465, 466, 480 thru 488; 507.0; 997.31 >> Respiratory = 1, else = 0;
ICD1 codes = 599.0 thru 599.5 >> Infectious = 1; else = 0;
ICD1 codes = 787.20 thru 787.29 >> Dysphagia = 1
ICD1 codes = 308 and 310
ICD1 codes = 707.00 thru 707.9 >> SkinUlcer = 1
Thanks!!
In a DATA step, try:
icd1_numeric = input(icd1, 8.);
At least for ICD9 codes, there are some that contain letters. This won't work unless the incoming value is purely numeric.
Good luck.
They do contains letters. In fact, I have 20 ICD code variables. So, I can't change the data type to numeric as you suggested.
I would suggest going to the ICD web page and downloading the latest code list - they are provided in txt and Excel, so you can easliy import them and then use that as your merge to list:
As for your categorisation, 462 = respiratory, I do not know how you come to that conclusion, as there doesn't appear to be that classification in ICD?
I don't quite understand what your suggestions are. I pulled the xls file but the codes didn't look like anything I have (no decimals). My problem is not trying to find the ICD codes for the categories but trying to list the codes that are in sequence without having to type them all (because I couldn't possibly knows each one of them). I wanted to see if I can do something like this that work for numeric variables.
if ICD1 in (462, 464, 465, 466, 480-488, 507.0, 997.31) then Respiratory = 1; else Respiratory = 0;
Any ideas how to deal with character variables?
I'm not a 100% clear on what you're trying to do but I'm assuming you've got a file with a code and its value (description). You want to assign your own value based on some kind of logic.
I would suggest importing your original list into a dataset, then using proc format to read from it and applying your logic to assign whatever values you want.
I know this isn't a very elaborate answer, but its all I can suggest based on my understanding.
Good luck.
First a word of warning ... if you don't know all the values you are looking for, you can never be sure that any solution is correct.
There are ways to abbreviate character searches. Consider this:
if icd1 =: '599.' then infectious=1; else infectious=0;
For any icd1 that begins with the characters "599.", the statement assigns 1. That would include other values that perhaps should not be assigned in that way, such as "599.9" and "599.V". So there are risks.
You can use this technique with a list of values:
if icd1 in: ('466.', '488.') then respiratory=1; else respiratory=0;
Of course, you'll have to expand the list, but you don't need to know all the "488." series of values.
Finally, when using :, make sure you code the decimal point. If you were to code without a decimal point you are taking risks:
if icd1 in: ('466', '488') then ...
This would also give you a match for 4-digit codes that begin with 466 or 488, such as "4662.1". I'm not saying that this is a valid code (I don't really know), but you never know what will actually appear in your data.
Good luck.
I think you understood what I'm trying to do. Yes, I do not know all the exact values down to 2 decimals. But I'll give your code a try. Thank you.
If your source document has "like" codes together Proc format may be a way to go. But without seeing a source it is a bit difficult to make specific suggestions.
Did someone provide a document of which codes get which assignment? If you could share that, we might have a few ideas.
They are stated in my initial posting. The only different is that I have ICD1 through ICD20. So for all of these ICD variables I need to recode them as below. The highlighted values are the ones I have problems with since character values doesn't work that way.
array _icdvar ICD1--ICD20;
do over _icdvar;
if _icdvar in ('462', '464', '465', '466', '480--488', '507.0', '997.31' then Respiratory = 1, else Respiratory = 0;
if _icdvar in ('599.0--599.5') then Infectious = 1; else Infectious = 0;
if _icdvar in ('787.20'--'787.29') then Dysphagia = 1; else Dysphagia = 0;
if _icdvar in ('308', '310') then Agititation = 1; else Agitation = 0;
if _icdvar in ('707.00'--'707.9') then SkinUlcer = 1; else SkinuUlcer = 0;
end;
Just the character to a number version then:
array _icdvar ICD1--ICD20;
do over _icdvar;
if input(_icdvar,best.) in (462,464,465,466,507,997.31) or (480 <= tmp_var 488) then Respiratory = 1, else Respiratory = 0;
...
end;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.