Re: merge, spedis, do loop, or macro

Bal23 · Posted 07-19-2016 03:23 PM

I have a large dataset, contains values either icd9 , icd10, or something else. I can use merge to a reference dataset to see whether it is icd9 or icd10. But I am thinking whether there are some other ways, using, strip, trim, or substr, index, or spedis function.

If I use spedis function, I can first merge it to icd9 reference list and icd10 reference list. Then I compare the points I get, to judge whether it is icd9 or icd10. Because there are lots of values neither icd9 nor icd10, that needs some additional work, means that I have to merge back to the original large dataset, and the duplicate record is somewhat hard to remove; or at least time consuming

I want to know if

Can I skip merge, but use do loop, or macro, useIf strip(string) = ‘ ‘ ( the value is on another dataset, icd10 code, reference dataset, about 69823 codes) then result =”10”
If strip(string) = ‘ ‘ (but the value is on another dataset, icd9 code, reference dataset, about 14025 codes) then result =”9”
Or any other more efficient way to do, without merge

Thanks

Quentin · Posted 07-19-2016 04:00 PM

You might try loading your lookup tables of ICD9 codes and ICD10 codes into two different hash tables. Then you can use hash table lookups. One benefit of this approach is don't have to sort the data like you would with a merge. If you want to see some sample code, I would suggest you post a small amount of sample data.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.

Bal23 · Posted 07-19-2016 04:36 PM

attached as a sample dataset

Quentin · Posted 07-19-2016 04:46 PM

@Bal23, please post a small amout of sample data as text in your question (preferably a data step with a cards statement). I don't trust Excel files (too many viruses / bad data conversions). You could post perhaps three small datasets: 10 records of sample data, 5 records of icd-9 lookups, and 5 records of icd-10 lookups. I think that should be enough for people to work with.

The exact matches should be easy to find (with hash approach or @Reeza's format approach, which may be easier for you if you haven't played with hashes before). If you want fuzzy matching of the sort that can be done with SPEDIS function, that will take more thought.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.

Bal23 · Posted 07-19-2016 05:03 PM

data have; length id icd date 10; format icd dollar 8.; 
input id date icd; datalines; id date icd 
123 101510 a09
  234 51812 k09 
256 61912 h980 
345 120213 98 
534 12310 overw 
2133 040512 e03
526 32116 underw
 21 10213 e8808
 56 20115 v342 
23   30314 345
41  020215 278
; run;

data icd-10;

input icd10 longdes;

E00 Congenital iodine-deficiency syndrome

E01 Iodine-deficiency related thyroid disorders and allied conditions

E02 Subclinical iodine-deficiency hypothyroidism

E03 Other hypothyroidism

E04 Other nontoxic goiter

E05 Thyrotoxicosis [hyperthyroidism]

E06 Thyroiditis

A09 Infectious gastroenteritis and colitis, unspecified

H99 Other and unspecified disorders of circulatory system

data icd9 longdes;

278 evans sym

0345 inflam

0098 colon dis

v32 hand injury

v54 rib fracture

e33 eye dise

e03 unspecific dis

e03 could be either icd9 or 10, some are neither icd9 nor icd10

Bal23 · Posted 07-20-2016 09:52 AM

how to do with hash tables?

Ludwig61 · Posted 07-19-2016 04:03 PM

#3 Use a hash object. Load your ICD9 and ICD10 reference tables into one hash with lookup values of 'ICD9' and 'ICD10' respectively. See if the code value in each row of your data is in the hash and retrieve either the 'ICD9 ' or 'ICD10' from that hit. If you don't get a hit, assign the value 'Other'.

Happy provider of untested code

Bal23 · Posted 07-19-2016 04:38 PM

Sorry I do not know anything ab hash object. I have provided my smalle sample data. Would you please provide sas code? Thanks.

Reeza · Posted 07-19-2016 04:39 PM

Pretty sure I suggested this earlier and I think it's probably the simplest in many ways. Create two formats, you can do that from reference tables, look up CNTLIN. Add in an other for not found formats.

Then the code can be:

if put(code, icd9_fmt.) ne 'Other' then code_type='ICD9';

else if put(code, icd10_fmt.) ne 'Other' then code_type='ICD10';

else code_type='Other';

Bal23 · Posted 07-19-2016 05:06 PM

sorry Reeza, I do not understand it. I did check cntlin online, but still have no clue on how to use it.

I have provided my sample dataset, as well as two reference dataset, would you please provide more details? Thanks.

error_prone · Posted 07-19-2016 05:19 PM

Using a dataset to create a format is explained in http://www2.sas.com/proceedings/forum2007/068-2007.pdf.

Reeza · Posted 07-19-2016 07:28 PM

@Bal23 wrote:

sorry Reeza, I do not understand it. I did check cntlin online, but still have no clue on how to use it.

I have provided my sample dataset, as well as two reference dataset, would you please provide more details? Thanks.

Where are you checking? I highly recommend searching for papers on lexjansen.com + keywords introduction/beginner which will provide you with a lot of resources.

Bal23 · Posted 07-20-2016 10:50 AM

Thanks. I did look at a paper, PROC FORMAT: USE OF THE CNTLIN OPTION, by Karuna

the author explained the duplicate records, So the id SHOULD BE unique in order for the format procedure to succesffully create a format for that variable

for my project, there are duplicate records and I cannot remove, I need to keep them

any advice or more detailed info on this solution?

Bal23 · Posted 07-20-2016 10:51 AM

Thanks. I did look at a paper, PROC FORMAT: USE OF THE CNTLIN OPTION, by Karuna

the author explained the duplicate records, So the id SHOULD BE unique in order for the format procedure to succesffully create a format

for that variable

for my project, there are duplicate records and I cannot remove, I need to keep them

any advice or more detailed info on this solution?

Reeza · Posted 07-20-2016 12:40 PM

@Bal23 wrote:

Thanks. I did look at a paper, PROC FORMAT: USE OF THE CNTLIN OPTION, by Karuna

the author explained the duplicate records, So the id SHOULD BE unique in order for the format procedure to succesffully create a format

for that variable

for my project, there are duplicate records and I cannot remove, I need to keep them

any advice or more detailed info on this solution?

The format is created based on your ICD9/10 reference tables. Each ICD9/10 code is unique. The format is applied to your other table and it doesn't matter how many duplicates.

Perhaps you should try the solution and see if it works rather than guessing all the reasons it may not work.

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away