Calcite | Level 5

## regular expression ICD-10-CM

how to write regular expression to identify cases coded using ICD-10-CM?

there are 9 diagnosis codes, as long as one diagnosis code meets the definition, then disease=1

((T3[679]9|T414|T427|T4[3579]9)[1-4].|(?!(T3[679]9|T414|T427|T4[3579]9))(T3[6-9]|T4[0-9]|T50)..[1-4])(A|\$|\b)

thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Tourmaline | Level 20

## Re: regular expression ICD-10-CM

If the expression you want to use is the one you gave, just write:

``DISEASE=prxmatch('/((T3[679]9|T414|T427|T4[3579]9)[1-4].|(?!(T3[679]9|T414|T427|T4[3579]9))(T3[6-9]|T4[0-9]|T50)..[1-4])(A|\$|\b)/',DIAGNOSIS_CODE);``

or

``DISEASE=prxmatch('/((T3[679]9|T414|T427|T4[3579]9)[1-4].|(?!(T3[679]9|T414|T427|T4[3579]9))(T3[6-9]|T4[0-9]|T50)..[1-4])(A|\$|\b)/',DIAGNOSIS_CODE)>0;``

8 REPLIES 8
PROC Star

## Re: regular expression ICD-10-CM

It is highly unlikely that many posters on the Community know anything about ICD-10-CM. I don't. If you could explain what the 9 diagnosis codes are, and some more example data with expected outcomes then you will have a better chance of getting suitable answers.

Calcite | Level 5

## Re: regular expression ICD-10-CM

thank you.

data contain

ID diagnosis1 diagnosis2 diagnosis3 diagnosis4 diagnosis5 diagnosis6 diagnosis7 diagnosis8 diagnosis9

1     T12340    T1235       S12400        S12340      T123          T1256        S12345      S13456      T567

The definition for disease is

((T3[679]9|T414|T427|T4[3579]9)[1-4].|(?!(T3[679]9|T414|T427|T4[3579]9))(T3[6-9]|T4[0-9]|T50)..[1-4])(A|\$|\b)

as long as one of the diagnosis meets the definition, the case is disease=1

Lapis Lazuli | Level 10

## Re: regular expression ICD-10-CM

As @SASKiwi  already said it is hard to build a regex if you don't know what it is exactly you are looking for (what are the boundaries of the codes and the general structure).

But even better then answering try to build it by yourself. Developing the pattern with this site is super intuitive: https://regexr.com/38ed7

If you are not familiar with the basic concepts of regex I recommend this blog post: https://www.janmeppe.com/blog/regex-for-noobs/

In SAS the function you would want to use is the prxmatch function(https://documentation.sas.com/?docsetId=lefunctionsref&docsetTarget=n0bj9p4401w3n9n1gmv6tf**bleep**9...).

Also a great resource about RegEx in SAS is the regex tip sheet by the SAS Support Team: https://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf

I hope this helps you!

Calcite | Level 5

## Re: regular expression ICD-10-CM

thanks for the resources. very helpful.

Tourmaline | Level 20

## Re: regular expression ICD-10-CM

If the expression you want to use is the one you gave, just write:

``DISEASE=prxmatch('/((T3[679]9|T414|T427|T4[3579]9)[1-4].|(?!(T3[679]9|T414|T427|T4[3579]9))(T3[6-9]|T4[0-9]|T50)..[1-4])(A|\$|\b)/',DIAGNOSIS_CODE);``

or

``DISEASE=prxmatch('/((T3[679]9|T414|T427|T4[3579]9)[1-4].|(?!(T3[679]9|T414|T427|T4[3579]9))(T3[6-9]|T4[0-9]|T50)..[1-4])(A|\$|\b)/',DIAGNOSIS_CODE)>0;``

Calcite | Level 5

## Re: regular expression ICD-10-CM

thank you!

here is the codes in case someone else needs it.

data want;

set datahave;

array injurydx[9] \$7 DIAG1-DIAG9;/*diagnosis 1-9 variables*/
do i = 1 to 9;
DISEASE=prxmatch('/((T3[679]9|T414|T427|T4[3579]9)[1-4].|(?!(T3[679]9|T414|T427|T4[3579]9))(T3[6-9]|T4[0-9]|T50)..[1-4])(A|\$|\b)/',injurydx[i] );
end;
drop i;
run;

Super User

## Re: regular expression ICD-10-CM

@xinyao2019 wrote:

how to write regular expression to identify cases coded using ICD-10-CM?

there are 9 diagnosis codes, as long as one diagnosis code meets the definition, then disease=1

((T3[679]9|T414|T427|T4[3579]9)[1-4].|(?!(T3[679]9|T414|T427|T4[3579]9))(T3[6-9]|T4[0-9]|T50)..[1-4])(A|\$|\b)

thanks

I know just enough about ICD-10 codes to be dangerous.

One of the recurring issues on this forum regarding ICD-10 and ICD-9 coding is that different organizations implement the coding slightly differently for the values with some using periods, other using _ instead of periods and I believe that we have at least one organization that uses custom informat / format pairs to create numeric ICD variables so some specific sort orders are used.

So you should likely show some of the entire values that you are searching for.

Another approach that has been used is custom format for specific disease which allows such thing as

If put(icdvar,\$customformatname.) = '1' then ...

moving the logic out to proc format.

If by any chance you actually have all of the codes in a data set it is very easy to create a CNTLIN data set for Proc format to create the needed format.

One of the concerns I have with Regex and ICD-10 is there are so many levels for some of the code groups and interpreting some of those expressions, as you are finding, may not be particularly easy to catch all of the cases involved.

Calcite | Level 5

## Re: regular expression ICD-10-CM

i did not know that different organizations implement  ICD-10-CM slightly different.

we have the ICD-10-CM in our dataset as Character

here is how it looks:

Obs DIAG1

1 I130

2 J189

3 S42201A

4 A047

5 J440

6 K5720

7 J189

8 O701

9 Z3800

10 Z3800

Discussion stats
• 8 replies
• 1647 views
• 3 likes
• 5 in conversation