Re: merge, spedis, do loop, or macro

Bal23 · Posted 07-20-2016 12:56 PM

you are right.

However, I need to modify my variable from the original dataset, so those values like

a287.8 a286.3 a298.90

will be truncated as

a28 a28 a29

therefore I have duplicate records

you see, if I do not truncate, I cannot try your solution, since the values are not exactly what the reference tables are, for example, the reference list only list

a287, a287.0

but a287.8 should be of course considered as icd10 as well

and some are like

v23.34 v1.34

so i decide only to get the first three then match

Reeza · Posted 07-20-2016 01:10 PM

Why can't you remove the duplicates?

If you truncate to 3 chars that's a category, if you try and merge on that you'll have multiple matches and your table will have multiple matches. Also, you could have incorrect matches if the values after aren't valid.

Why not expand to the full list of reference values. I'm sure there's a list on CMS website.

I'm fairly certain that most DBs remove the decimal and list the values as A2878.

Quentin · Posted 07-20-2016 08:29 AM

I think @Reeza is right that a PROC FORMAT approach will be easiest. Formats allow simple look-ups like you need. And if you are working with ICD9 codes, you probably already have formats that labels them all (or if you don't, it's worth building the formats).

Below is a small example of using formats to look up each code in HAVE to assign it a label per ICD9, and a label per ICD10. Of course you can add logic to catch codes that have both ICD9 and ICD10 labels, or neither. But it's a starting point.

If you need to build the formats from existing data, then PROC FORMAT CNTLIN is the way to go (not shown below).

data have; 
input id $1. icd : $3.; 
cards;
1 E01
2 E02
3 E03
4 E33
5 E99
; 
run;

proc format;
  value $ICDnine
    "E33"="eye dise"
    "E03"="unspecific dis"
    other=" "
  ;
  value $ICDten
    "E01"="Iodine-deficiency related thyroid disorders and allied conditions"
    "E02"="Subclinical iodine-deficiency hypothyroidism"
    "E03"="Other hypothyroidism"
    other=" "
  ;
run;

data want; 
  set have;
  length ICD9text $100 ICD10text $100;
  ICD9text=put(icd,$ICDnine.);
  ICD10text=put(icd,$ICDten.);
  put (icd icd9text icd10text)(=);
run;

Returns in log:

199  data want;
200    set have;
201    length ICD9text $100 ICD10text $100;
202    ICD9text=put(icd,$ICDnine.);
203    ICD10text=put(icd,$ICDten.);
204    put (icd icd9text icd10text)(=);
205  run;

icd=E01 ICD9text=  ICD10text=Iodine-deficiency related thyroid disorders and allied conditions
icd=E02 ICD9text=  ICD10text=Subclinical iodine-deficiency hypothyroidism
icd=E03 ICD9text=unspecific dis ICD10text=Other hypothyroidism
icd=E33 ICD9text=eye dise ICD10text=
icd=E99 ICD9text=  ICD10text=
NOTE: There were 5 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 5 observations and 4 variables.

The Boston Area SAS Users Group is hosting free webinars!

Register now at https://www.basug.org/events.

Bal23 · Posted 07-20-2016 10:12 AM

thank you

my original dataset contains values like A2.89, K299.47, 034.568,

They do not quite match the reference dataset, which already remove "."

i need some more flexible ways, which means, do I have to modify the value of reference dataset and my have dataset first, then use CNTLIN?

Quentin · Posted 07-20-2016 10:17 AM

Yes, for the PROC FORMAT approach (and hash approach) the values in your HAVE data must match the values defined in the format (or hash table key) exactly. So you could clean the values to make them consistent. Or if you want FUZZY matching, would need a different approach.

The Boston Area SAS Users Group is hosting free webinars!

Register now at https://www.basug.org/events.

ballardw · Posted 07-20-2016 10:29 AM

@Bal23 wrote:

thank you

my original dataset contains values like A2.89, K299.47, 034.568,

They do not quite match the reference dataset, which already remove "."

i need some more flexible ways, which means, do I have to modify the value of reference dataset and my have dataset first, then use CNTLIN?

I do a very small amount of work with ICD codes but your comment about "already remove "."" makes me very concerned as those decimals are very significant and a remember some where it is possibly to have A2.8 and A28. type structures. I would question the decision about creating any type of reference table without the decimals.

And for formats I might also recommend, though it is more work, to investigate MULTILABEL formats. When used with the procedures that support them it makes reports of the form "I want all injuries to feet with a total AND the break out by subcategories of foot injuries as well" easier.

Bal23 · Posted 07-20-2016 10:47 AM

thanks

your are right

for the current project, i only need to tell whether it is icd9 or 10, so A2.8, OR A28.9 are both icd-10, therefore this is not my concern for this