I have a large dataset, contains values either icd9 , icd10, or something else. I can use merge to a reference dataset to see whether it is icd9 or icd10. But I am thinking whether there are some other ways, using, strip, trim, or substr, index, or spedis function.
If I use spedis function, I can first merge it to icd9 reference list and icd10 reference list. Then I compare the points I get, to judge whether it is icd9 or icd10. Because there are lots of values neither icd9 nor icd10, that needs some additional work, means that I have to merge back to the original large dataset, and the duplicate record is somewhat hard to remove; or at least time consuming
I want to know if
Thanks
attached as a sample dataset
@Bal23, please post a small amout of sample data as text in your question (preferably a data step with a cards statement). I don't trust Excel files (too many viruses / bad data conversions). You could post perhaps three small datasets: 10 records of sample data, 5 records of icd-9 lookups, and 5 records of icd-10 lookups. I think that should be enough for people to work with.
The exact matches should be easy to find (with hash approach or @Reeza's format approach, which may be easier for you if you haven't played with hashes before). If you want fuzzy matching of the sort that can be done with SPEDIS function, that will take more thought.
data have; length id icd date 10; format icd dollar 8.;
input id date icd; datalines; id date icd
123 101510 a09
234 51812 k09
256 61912 h980
345 120213 98
534 12310 overw
2133 040512 e03
526 32116 underw
21 10213 e8808
56 20115 v342
23 30314 345
41 020215 278
; run;
data icd-10;
input icd10 longdes;
E00 Congenital iodine-deficiency syndrome
E01 Iodine-deficiency related thyroid disorders and allied conditions
E02 Subclinical iodine-deficiency hypothyroidism
E03 Other hypothyroidism
E04 Other nontoxic goiter
E05 Thyrotoxicosis [hyperthyroidism]
E06 Thyroiditis
A09 Infectious gastroenteritis and colitis, unspecified
H99 Other and unspecified disorders of circulatory system
data icd9 longdes;
278 evans sym
0345 inflam
0098 colon dis
v32 hand injury
v54 rib fracture
e33 eye dise
e03 unspecific dis
e03 could be either icd9 or 10, some are neither icd9 nor icd10
how to do with hash tables?
#3 Use a hash object. Load your ICD9 and ICD10 reference tables into one hash with lookup values of 'ICD9' and 'ICD10' respectively. See if the code value in each row of your data is in the hash and retrieve either the 'ICD9 ' or 'ICD10' from that hit. If you don't get a hit, assign the value 'Other'.
Sorry I do not know anything ab hash object. I have provided my smalle sample data. Would you please provide sas code? Thanks.
Pretty sure I suggested this earlier and I think it's probably the simplest in many ways. Create two formats, you can do that from reference tables, look up CNTLIN. Add in an other for not found formats.
Then the code can be:
if put(code, icd9_fmt.) ne 'Other' then code_type='ICD9';
else if put(code, icd10_fmt.) ne 'Other' then code_type='ICD10';
else code_type='Other';
sorry Reeza, I do not understand it. I did check cntlin online, but still have no clue on how to use it.
I have provided my sample dataset, as well as two reference dataset, would you please provide more details? Thanks.
Using a dataset to create a format is explained in http://www2.sas.com/proceedings/forum2007/068-2007.pdf.
@Bal23 wrote:
sorry Reeza, I do not understand it. I did check cntlin online, but still have no clue on how to use it.
I have provided my sample dataset, as well as two reference dataset, would you please provide more details? Thanks.
Where are you checking? I highly recommend searching for papers on lexjansen.com + keywords introduction/beginner which will provide you with a lot of resources.
Thanks. I did look at a paper, PROC FORMAT: USE OF THE CNTLIN OPTION, by Karuna
the author explained the duplicate records, So the id SHOULD BE unique in order for the format procedure to succesffully create a format for that variable
for my project, there are duplicate records and I cannot remove, I need to keep them
any advice or more detailed info on this solution?
Thanks. I did look at a paper, PROC FORMAT: USE OF THE CNTLIN OPTION, by Karuna
the author explained the duplicate records, So the id SHOULD BE unique in order for the format procedure to succesffully create a format
for that variable
for my project, there are duplicate records and I cannot remove, I need to keep them
any advice or more detailed info on this solution?
@Bal23 wrote:
Thanks. I did look at a paper, PROC FORMAT: USE OF THE CNTLIN OPTION, by Karuna
the author explained the duplicate records, So the id SHOULD BE unique in order for the format procedure to succesffully create a format
for that variable
for my project, there are duplicate records and I cannot remove, I need to keep them
any advice or more detailed info on this solution?
The format is created based on your ICD9/10 reference tables. Each ICD9/10 code is unique. The format is applied to your other table and it doesn't matter how many duplicates.
Perhaps you should try the solution and see if it works rather than guessing all the reasons it may not work.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.