BookmarkSubscribeRSS Feed
Bal23
Lapis Lazuli | Level 10

I have a large dataset, contains values either icd9 , icd10, or something else. I can use merge to a reference dataset to see whether it is icd9 or icd10. But I am thinking whether there are some other ways, using, strip, trim, or substr, index, or spedis function.

If I use spedis function, I can first merge it to icd9 reference list and icd10 reference list. Then I compare the points I get, to judge whether it is icd9 or icd10. Because there are lots of values neither icd9 nor icd10, that needs some additional work, means that I have to merge back to the original large dataset, and the duplicate record is somewhat hard to remove; or at least time consuming

I want to know if

  1. Can I skip merge, but use do loop, or macro, useIf strip(string) = ‘ ‘ ( the value is on another dataset, icd10 code, reference dataset, about 69823 codes) then result =”10”
  2. If strip(string) = ‘ ‘ (but the value is on another dataset, icd9 code, reference dataset, about 14025 codes) then result =”9”
  3. Or any other more efficient way to do, without merge

Thanks

21 REPLIES 21
Quentin
Super User
You might try loading your lookup tables of ICD9 codes and ICD10 codes into two different hash tables. Then you can use hash table lookups. One benefit of this approach is don't have to sort the data like you would with a merge. If you want to see some sample code, I would suggest you post a small amount of sample data.
The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
Bal23
Lapis Lazuli | Level 10

attached as a sample dataset

Quentin
Super User

@Bal23, please post a small amout of sample data as text in your question (preferably a data step with a cards statement).  I don't trust Excel files (too many viruses / bad data conversions).  You could post perhaps three small datasets: 10 records of sample data, 5 records of icd-9 lookups, and 5 records of icd-10 lookups.  I think that should be enough for people to work with.  

 

The exact matches should be easy to find (with hash approach or @Reeza's format approach, which may be easier for you if you haven't played with hashes before).  If you want fuzzy matching of the sort that can be done with SPEDIS function, that will take more thought.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
Bal23
Lapis Lazuli | Level 10
data have; length id icd date 10; format icd dollar 8.; 
input id date icd; datalines; id date icd
123 101510 a09
234 51812 k09
256 61912 h980
345 120213 98
534 12310 overw
2133 040512 e03
526 32116 underw
21 10213 e8808
56 20115 v342
23 30314 345
41 020215 278
; run;

data icd-10;

 

input icd10 longdes;

E00  Congenital iodine-deficiency syndrome

E01  Iodine-deficiency related thyroid disorders and allied conditions

E02  Subclinical iodine-deficiency hypothyroidism

E03  Other hypothyroidism

E04  Other nontoxic goiter

E05  Thyrotoxicosis [hyperthyroidism]

E06  Thyroiditis

A09  Infectious gastroenteritis and colitis, unspecified

H99  Other and unspecified disorders of circulatory system

 

data icd9 longdes;

278  evans sym

0345 inflam

0098 colon dis

v32  hand injury

v54  rib fracture

e33  eye dise

e03  unspecific dis

 

 

 

e03 could be either icd9 or 10, some are neither icd9 nor icd10

Bal23
Lapis Lazuli | Level 10

how to do with hash tables?

 

Ludwig61
Fluorite | Level 6

#3 Use a hash object.  Load your ICD9 and ICD10 reference tables into one hash with lookup values of 'ICD9' and 'ICD10' respectively.  See if the code value in each row of your data is in the hash and retrieve either the 'ICD9 ' or 'ICD10' from that hit.  If you don't get a hit, assign the value 'Other'.

Happy provider of untested code
Bal23
Lapis Lazuli | Level 10

Sorry I do not know anything ab hash object. I have provided my smalle sample data. Would you please provide sas code? Thanks.

Reeza
Super User

Pretty sure I suggested this earlier and I think it's probably the simplest in many ways. Create two formats, you can do that from reference tables, look up CNTLIN. Add in an other for not found formats. 

 

Then the code can be:

 

if put(code, icd9_fmt.) ne 'Other' then code_type='ICD9';

else if put(code, icd10_fmt.) ne 'Other' then code_type='ICD10';

else code_type='Other';

Bal23
Lapis Lazuli | Level 10

sorry Reeza, I do not understand it. I did check cntlin online, but still have no clue on how to use it.

I have provided my sample dataset, as well as two reference dataset, would you please provide more details? Thanks.

error_prone
Barite | Level 11

 

Using a dataset to create a format is explained in http://www2.sas.com/proceedings/forum2007/068-2007.pdf.

 

Reeza
Super User

@Bal23 wrote:

sorry Reeza, I do not understand it. I did check cntlin online, but still have no clue on how to use it.

I have provided my sample dataset, as well as two reference dataset, would you please provide more details? Thanks.


Where are you checking? I highly recommend searching for papers on lexjansen.com + keywords introduction/beginner which will provide you with a lot of resources. 

Bal23
Lapis Lazuli | Level 10

Thanks. I did look at a paper, PROC FORMAT: USE OF THE CNTLIN OPTION, by Karuna

the author explained the duplicate records, So the id SHOULD BE unique in order for the format procedure to succesffully create a format for that variable

for my project, there are duplicate records and I cannot remove, I need to keep them

any advice or more detailed info on this solution?

 

Bal23
Lapis Lazuli | Level 10

Thanks. I did look at a paper, PROC FORMAT: USE OF THE CNTLIN OPTION, by Karuna

 

the author explained the duplicate records, So the id SHOULD BE unique in order for the format procedure to succesffully create a format

for that variable

 

for my project, there are duplicate records and I cannot remove, I need to keep them

 

any advice or more detailed info on this solution?

 

Reeza
Super User

@Bal23 wrote:

Thanks. I did look at a paper, PROC FORMAT: USE OF THE CNTLIN OPTION, by Karuna

 

the author explained the duplicate records, So the id SHOULD BE unique in order for the format procedure to succesffully create a format

for that variable

 

for my project, there are duplicate records and I cannot remove, I need to keep them

 

any advice or more detailed info on this solution?

 


 

 

The format is created based on your ICD9/10 reference tables. Each ICD9/10 code is unique. The format is applied to your other table and it doesn't matter how many duplicates.

 

Perhaps you should try the solution and see if it works rather than guessing all the reasons it may not work. 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 21 replies
  • 3934 views
  • 9 likes
  • 6 in conversation