BookmarkSubscribeRSS Feed
Timg
Fluorite | Level 6

I have been working with ICD-10 health care diagnosis codes arranged in sequence to uncover patterns in patient utilization. I am modeling my process off market basket analysis often done in retail. My data is arranged as follows:

Timg_0-1646330370029.png

 

Where:

  • “ITEM” is the target and contains ICD – 10 Codes e.g., “ABNORMAL GLUCOSE COMP PREGNANCY”
  • “MCID” is my ID e.g., “1,2,3”
  • “SEQ” is my sequence variable e.g., “1,2,3” that is converted from date format in base SAS. The data are sorted by ID and SEQ order. SEQ can be as high as 50

Sorry, I can’t share the data as it contains personal information. Below is an approximation

MCID

ITEM

SEQ

1

POISN METHYLPHENIDATE UNDET SEQUELA

1

1

SKELETAL FLUOROSIS THIGH

2

1

ABNORMAL GLUCOSE COMP PREGNANCY

3

1

LOW-TENSION GLAUCOMA  RIGHT EYE

4

2

NDSPLC FX PROX PHAL RT RF INIT OP

1

2

OTH INJ RAD ART WRST HND LT ARM SEQ

2

2

INJ ABDUCENT NERVE RT SIDE INITIAL

3

2

OTH INJ LT QUAD MUSC FASC TEND SEQ

4

3

INF INFLM RXN PROS DEV GFT URIN SYS

1

3

NDSPL FX CAPITATE BN RT WRST SB RTN

2

3

LAC M&T LNG EXT TOE ANK FT UNS SIDE

3

 

My analysis runs fine but I am finding the results are shallow. Even with thousands of patients and 7 years of data, even my low confidence and support rules are very basic and only have chain lengths of 3-4.

Example:

Chain Length

Transaction Count

Support(%)

Confidence(%)

PseudoLift

Rule

3

2896

2.005082

36.6768

1.587408

 LOW BACK PAIN ==> CONTCT EXPS OTH VIRL COMMUNICABL DZ ==> CONTACT W/AND (SUSP) EXPOS COVID-19

 

I think the problem lies with alignment of each subject’s diagnosis sequences. Meaning there are groups of subjects that have similar patterns but they are not aligned in such a way that SAS can process them. So I did experiments adjusting Chain Count and Consolidate Time in the Association node but “0” seemed to work best. My next step is to investigate TS Data Prep and TS Similarity / Time Warping but I am not finding any good learning resources and TS Data Prep needs my target variable to be an interval which I can not do (ICD – 10’s are factors / nominal).

Thanks for reading this long post. I would appreciate any learning documents or comments on how I might get a better analysis.

4 REPLIES 4
sbxkoenk
SAS Super FREQ

Hello,

 

The 6 nodes in the Time Series tab of your Enterprise Miner diagram will not bring you any further as they are designed for numerical time series.

 

Recurrent Neural Networks (RNNs) are specifically designed to handle sequence data, such as speech, text, time series, and so on. RNNs are called recurrent because they perform the same task for every element of a sequence. The output for each element depends on the computations of its preceding elements.

Unfortunately, RNNs are NOT in Enterprise Miner (but they are in SAS VIYA Model Studio).

You could call Python RNNs from Enterprise Miner though.

 

What you can also do : 

Make all your 'ICD-10 health care diagnosis codes' variables.
Give all your patients a 1 / 0 ( Y / N ) code for every diagnosis.
You can then calculate the distance between patients (for example with PROC DISTANCE and the Jaccard Coefficient).
Using the distance matrix, you can then do clustering of patients.
However, the above approach disregards the sequence of events. So that may not be what you want.

 

Good luck,

Koen

Timg
Fluorite | Level 6

Thanks Koen,

 

So you think Viya could handle ICD – 10’s as factors and do time warping on them?

sbxkoenk
SAS Super FREQ

Hello,

 

I do not think you can use recurrent neural networks (RNNs) in SAS VIYA to do dynamic time warping.

But your question is an interesting one. ( Time warping on time-stamped sequences of ICD – 10 codes )

Let me investigate.

 

Kind regards,

Koen

sbxkoenk
SAS Super FREQ

Hello @Timg ,

 

I asked info to 2 colleagues.

Here is what I found out (thanks to them).

 

SAX (Symbolic Aggregate approXimation) has some way to measure the distance between two strings that represent time series. See

https://go.documentation.sas.com/doc/en/pgmsascdc/v_016/castsp/castsp_tsd_sect047.htm

https://jmotif.github.io/sax-vsm_site/morea/algorithm/SAX.html

 

Since the TSD (Time Series Distance) package does not accept text sequences as an input data, you cannot use TSD/DTW (Dynamic Time Warping) directly.

However, if you map the text items to numeric numbers, assuming you know all the words that occur in all the text sequences, you can use DTW at the TSD package with timeid = obs.

 

The longest common subsequence example at the TSD package uses proc format for a similar problem.

https://go.documentation.sas.com/doc/en/pgmsascdc/v_016/castsp/castsp_tsd_sect084.htm

 

Note: For the TSD (Time Series Distance Measure) Package, you need a Visual Forecasting license in SAS VIYA 3.5+.

TSD contains SAX and DTW.

Visual Forecasting also offers the PROC TSMODEL.

 

Good luck,

Koen

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2720 views
  • 0 likes
  • 2 in conversation