BookmarkSubscribeRSS Feed
Bal23
Lapis Lazuli | Level 10

you are right.

However, I need to modify my variable from the original dataset, so those values like

a287.8 a286.3 a298.90

will be truncated as

a28 a28 a29

therefore I have duplicate records

 

you see, if I do  not truncate, I cannot try your solution, since the values are not exactly what the reference tables are, for example, the reference list only list

a287, a287.0

but a287.8 should be of course considered as icd10 as well

and some are like

v23.34 v1.34

so i decide only to get  the first three then match

 

Reeza
Super User

Why can't you remove the duplicates?

If you truncate to 3 chars that's a category, if you try and merge on that you'll have multiple matches and your table will have multiple matches. Also, you could have incorrect matches if the values after aren't valid.

 

Why not expand to the full list of reference values. I'm sure there's a list on CMS website. 

I'm fairly certain that most DBs remove the decimal and list the values as A2878. 

 

 

Quentin
Super User

I think @Reeza is right that a PROC FORMAT approach will be easiest.  Formats allow simple look-ups like you need.  And if you are working with ICD9 codes, you probably already have formats that labels them all (or if you don't, it's worth building the formats).

 

Below is a small example of using formats to look up each code in HAVE to assign it a label per ICD9, and a label per ICD10.  Of course you can add logic to catch codes that have both ICD9 and ICD10 labels, or neither.  But it's a starting point. 

 

If you need to build the formats from existing data, then PROC FORMAT CNTLIN is the way to go (not shown below).

 

data have; 
input id $1. icd : $3.; 
cards;
1 E01
2 E02
3 E03
4 E33
5 E99
; 
run;

proc format;
  value $ICDnine
    "E33"="eye dise"
    "E03"="unspecific dis"
    other=" "
  ;
  value $ICDten
    "E01"="Iodine-deficiency related thyroid disorders and allied conditions"
    "E02"="Subclinical iodine-deficiency hypothyroidism"
    "E03"="Other hypothyroidism"
    other=" "
  ;
run;

data want; 
  set have;
  length ICD9text $100 ICD10text $100;
  ICD9text=put(icd,$ICDnine.);
  ICD10text=put(icd,$ICDten.);
  put (icd icd9text icd10text)(=);
run;

Returns in log:

 

199  data want;
200    set have;
201    length ICD9text $100 ICD10text $100;
202    ICD9text=put(icd,$ICDnine.);
203    ICD10text=put(icd,$ICDten.);
204    put (icd icd9text icd10text)(=);
205  run;

icd=E01 ICD9text=  ICD10text=Iodine-deficiency related thyroid disorders and allied conditions
icd=E02 ICD9text=  ICD10text=Subclinical iodine-deficiency hypothyroidism
icd=E03 ICD9text=unspecific dis ICD10text=Other hypothyroidism
icd=E33 ICD9text=eye dise ICD10text=
icd=E99 ICD9text=  ICD10text=
NOTE: There were 5 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 5 observations and 4 variables.
BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Bal23
Lapis Lazuli | Level 10

thank you

my original dataset contains values like A2.89, K299.47, 034.568,

They do not quite match the reference dataset, which already remove "."

i need some more flexible ways, which means, do I have to modify the value of reference dataset  and my have dataset first, then use  CNTLIN?

 

Quentin
Super User

Yes, for the PROC FORMAT approach (and hash approach) the values in your HAVE data must match the values defined in the format (or hash table key) exactly. So you could clean the values to make them consistent.  Or if you want FUZZY matching, would need a different approach. 

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
ballardw
Super User

@Bal23 wrote:

thank you

my original dataset contains values like A2.89, K299.47, 034.568,

They do not quite match the reference dataset, which already remove "."

i need some more flexible ways, which means, do I have to modify the value of reference dataset  and my have dataset first, then use  CNTLIN?

 


I do a very small amount of work with ICD codes but your comment about "already remove "."" makes me very concerned as those decimals are very significant and a remember some where it is possibly to have A2.8 and A28. type structures. I would question the decision about creating any type of reference table without the decimals.

 

And for formats I might also recommend, though it is more work, to investigate MULTILABEL formats. When used with the procedures that support them it makes reports of the form "I want all injuries to feet with a total AND the break out by subcategories of foot injuries as well" easier.

Bal23
Lapis Lazuli | Level 10

thanks

your are right

for the current project, i only need to tell whether it is icd9 or 10, so A2.8, OR A28.9 are both icd-10, therefore this is not my concern for this

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 2801 views
  • 9 likes
  • 6 in conversation