DATA Step, Macro, Functions and more

merge, spedis, do loop, or macro

Reply
Super Contributor
Posts: 345

merge, spedis, do loop, or macro

I have a large dataset, contains values either icd9 , icd10, or something else. I can use merge to a reference dataset to see whether it is icd9 or icd10. But I am thinking whether there are some other ways, using, strip, trim, or substr, index, or spedis function.

If I use spedis function, I can first merge it to icd9 reference list and icd10 reference list. Then I compare the points I get, to judge whether it is icd9 or icd10. Because there are lots of values neither icd9 nor icd10, that needs some additional work, means that I have to merge back to the original large dataset, and the duplicate record is somewhat hard to remove; or at least time consuming

I want to know if

  1. Can I skip merge, but use do loop, or macro, useIf strip(string) = ‘ ‘ ( the value is on another dataset, icd10 code, reference dataset, about 69823 codes) then result =”10”
  2. If strip(string) = ‘ ‘ (but the value is on another dataset, icd9 code, reference dataset, about 14025 codes) then result =”9”
  3. Or any other more efficient way to do, without merge

Thanks

PROC Star
Posts: 1,322

Re: merge, spedis, do loop, or macro

You might try loading your lookup tables of ICD9 codes and ICD10 codes into two different hash tables. Then you can use hash table lookups. One benefit of this approach is don't have to sort the data like you would with a merge. If you want to see some sample code, I would suggest you post a small amount of sample data.
Super Contributor
Posts: 345

Re: merge, spedis, do loop, or macro

attached as a sample dataset

PROC Star
Posts: 1,322

Re: merge, spedis, do loop, or macro

@wenling, please post a small amout of sample data as text in your question (preferably a data step with a cards statement).  I don't trust Excel files (too many viruses / bad data conversions).  You could post perhaps three small datasets: 10 records of sample data, 5 records of icd-9 lookups, and 5 records of icd-10 lookups.  I think that should be enough for people to work with.  

 

The exact matches should be easy to find (with hash approach or @Reeza's format approach, which may be easier for you if you haven't played with hashes before).  If you want fuzzy matching of the sort that can be done with SPEDIS function, that will take more thought.

Super Contributor
Posts: 345

Re: merge, spedis, do loop, or macro

[ Edited ]
data have; length id icd date 10; format icd dollar 8.; 
input id date icd; datalines; id date icd
123 101510 a09
234 51812 k09
256 61912 h980
345 120213 98
534 12310 overw
2133 040512 e03
526 32116 underw
21 10213 e8808
56 20115 v342
23 30314 345
41 020215 278
; run;

data icd-10;

 

input icd10 longdes;

E00  Congenital iodine-deficiency syndrome

E01  Iodine-deficiency related thyroid disorders and allied conditions

E02  Subclinical iodine-deficiency hypothyroidism

E03  Other hypothyroidism

E04  Other nontoxic goiter

E05  Thyrotoxicosis [hyperthyroidism]

E06  Thyroiditis

A09  Infectious gastroenteritis and colitis, unspecified

H99  Other and unspecified disorders of circulatory system

 

data icd9 longdes;

278  evans sym

0345 inflam

0098 colon dis

v32  hand injury

v54  rib fracture

e33  eye dise

e03  unspecific dis

 

 

 

e03 could be either icd9 or 10, some are neither icd9 nor icd10

Super Contributor
Posts: 345

Re: merge, spedis, do loop, or macro

how to do with hash tables?

 

New Contributor
Posts: 2

Re: merge, spedis, do loop, or macro

#3 Use a hash object.  Load your ICD9 and ICD10 reference tables into one hash with lookup values of 'ICD9' and 'ICD10' respectively.  See if the code value in each row of your data is in the hash and retrieve either the 'ICD9 ' or 'ICD10' from that hit.  If you don't get a hit, assign the value 'Other'.

Happy provider of untested code
Super Contributor
Posts: 345

Re: merge, spedis, do loop, or macro

Sorry I do not know anything ab hash object. I have provided my smalle sample data. Would you please provide sas code? Thanks.

Super User
Posts: 19,770

Re: merge, spedis, do loop, or macro

Pretty sure I suggested this earlier and I think it's probably the simplest in many ways. Create two formats, you can do that from reference tables, look up CNTLIN. Add in an other for not found formats. 

 

Then the code can be:

 

if put(code, icd9_fmt.) ne 'Other' then code_type='ICD9';

else if put(code, icd10_fmt.) ne 'Other' then code_type='ICD10';

else code_type='Other';

Super Contributor
Posts: 345

Re: merge, spedis, do loop, or macro

sorry Reeza, I do not understand it. I did check cntlin online, but still have no clue on how to use it.

I have provided my sample dataset, as well as two reference dataset, would you please provide more details? Thanks.

Frequent Contributor
Posts: 149

Re: merge, spedis, do loop, or macro

 

Using a dataset to create a format is explained in http://www2.sas.com/proceedings/forum2007/068-2007.pdf.

 

Super User
Posts: 19,770

Re: merge, spedis, do loop, or macro


wenling wrote:

sorry Reeza, I do not understand it. I did check cntlin online, but still have no clue on how to use it.

I have provided my sample dataset, as well as two reference dataset, would you please provide more details? Thanks.


Where are you checking? I highly recommend searching for papers on lexjansen.com + keywords introduction/beginner which will provide you with a lot of resources. 

Super Contributor
Posts: 345

Re: merge, spedis, do loop, or macro

Thanks. I did look at a paper, PROC FORMAT: USE OF THE CNTLIN OPTION, by Karuna

the author explained the duplicate records, So the id SHOULD BE unique in order for the format procedure to succesffully create a format for that variable

for my project, there are duplicate records and I cannot remove, I need to keep them

any advice or more detailed info on this solution?

 

Super Contributor
Posts: 345

Re: merge, spedis, do loop, or macro

Thanks. I did look at a paper, PROC FORMAT: USE OF THE CNTLIN OPTION, by Karuna

 

the author explained the duplicate records, So the id SHOULD BE unique in order for the format procedure to succesffully create a format

for that variable

 

for my project, there are duplicate records and I cannot remove, I need to keep them

 

any advice or more detailed info on this solution?

 

Super User
Posts: 19,770

Re: merge, spedis, do loop, or macro


wenling wrote:

Thanks. I did look at a paper, PROC FORMAT: USE OF THE CNTLIN OPTION, by Karuna

 

the author explained the duplicate records, So the id SHOULD BE unique in order for the format procedure to succesffully create a format

for that variable

 

for my project, there are duplicate records and I cannot remove, I need to keep them

 

any advice or more detailed info on this solution?

 


 

 

The format is created based on your ICD9/10 reference tables. Each ICD9/10 code is unique. The format is applied to your other table and it doesn't matter how many duplicates.

 

Perhaps you should try the solution and see if it works rather than guessing all the reasons it may not work. 

Ask a Question
Discussion stats
  • 21 replies
  • 994 views
  • 9 likes
  • 6 in conversation