Enterprise Miner Association Node with TS Similarity

Timg · Posted 03-03-2022 01:02 PM

I have been working with ICD-10 health care diagnosis codes arranged in sequence to uncover patterns in patient utilization. I am modeling my process off market basket analysis often done in retail. My data is arranged as follows:

Where:

“ITEM” is the target and contains ICD – 10 Codes e.g., “ABNORMAL GLUCOSE COMP PREGNANCY”
“MCID” is my ID e.g., “1,2,3”
“SEQ” is my sequence variable e.g., “1,2,3” that is converted from date format in base SAS. The data are sorted by ID and SEQ order. SEQ can be as high as 50

Sorry, I can’t share the data as it contains personal information. Below is an approximation

MCID	ITEM	SEQ
1	POISN METHYLPHENIDATE UNDET SEQUELA	1
1	SKELETAL FLUOROSIS THIGH	2
1	ABNORMAL GLUCOSE COMP PREGNANCY	3
1	LOW-TENSION GLAUCOMA RIGHT EYE	4
2	NDSPLC FX PROX PHAL RT RF INIT OP	1
2	OTH INJ RAD ART WRST HND LT ARM SEQ	2
2	INJ ABDUCENT NERVE RT SIDE INITIAL	3
2	OTH INJ LT QUAD MUSC FASC TEND SEQ	4
3	INF INFLM RXN PROS DEV GFT URIN SYS	1
3	NDSPL FX CAPITATE BN RT WRST SB RTN	2
3	LAC M&T LNG EXT TOE ANK FT UNS SIDE	3

My analysis runs fine but I am finding the results are shallow. Even with thousands of patients and 7 years of data, even my low confidence and support rules are very basic and only have chain lengths of 3-4.

Example:

Chain Length	Transaction Count	Support(%)	Confidence(%)	PseudoLift	Rule
3	2896	2.005082	36.6768	1.587408	LOW BACK PAIN ==> CONTCT EXPS OTH VIRL COMMUNICABL DZ ==> CONTACT W/AND (SUSP) EXPOS COVID-19

I think the problem lies with alignment of each subject’s diagnosis sequences. Meaning there are groups of subjects that have similar patterns but they are not aligned in such a way that SAS can process them. So I did experiments adjusting Chain Count and Consolidate Time in the Association node but “0” seemed to work best. My next step is to investigate TS Data Prep and TS Similarity / Time Warping but I am not finding any good learning resources and TS Data Prep needs my target variable to be an interval which I can not do (ICD – 10’s are factors / nominal).

Thanks for reading this long post. I would appreciate any learning documents or comments on how I might get a better analysis.

sbxkoenk · Posted 03-04-2022 02:08 PM

Hello,

The 6 nodes in the Time Series tab of your Enterprise Miner diagram will not bring you any further as they are designed for numerical time series.

Recurrent Neural Networks (RNNs) are specifically designed to handle sequence data, such as speech, text, time series, and so on. RNNs are called recurrent because they perform the same task for every element of a sequence. The output for each element depends on the computations of its preceding elements.

Unfortunately, RNNs are NOT in Enterprise Miner (but they are in SAS VIYA Model Studio).

You could call Python RNNs from Enterprise Miner though.

What you can also do :

Make all your 'ICD-10 health care diagnosis codes' variables.
Give all your patients a 1 / 0 ( Y / N ) code for every diagnosis.
You can then calculate the distance between patients (for example with PROC DISTANCE and the Jaccard Coefficient).
Using the distance matrix, you can then do clustering of patients.
However, the above approach disregards the sequence of events. So that may not be what you want.

Good luck,

Koen

Timg · Posted 03-07-2022 03:07 PM

Thanks Koen,

So you think Viya could handle ICD – 10’s as factors and do time warping on them?

sbxkoenk · Posted 03-07-2022 04:02 PM

Hello,

I do not think you can use recurrent neural networks (RNNs) in SAS VIYA to do dynamic time warping.

But your question is an interesting one. ( Time warping on time-stamped sequences of ICD – 10 codes )

Let me investigate.

Kind regards,

Koen

sbxkoenk · Posted 03-08-2022 11:51 AM

Hello @Timg ,

I asked info to 2 colleagues.

Here is what I found out (thanks to them).

SAX (Symbolic Aggregate approXimation) has some way to measure the distance between two strings that represent time series. See

https://go.documentation.sas.com/doc/en/pgmsascdc/v_016/castsp/castsp_tsd_sect047.htm

https://jmotif.github.io/sax-vsm_site/morea/algorithm/SAX.html

Since the TSD (Time Series Distance) package does not accept text sequences as an input data, you cannot use TSD/DTW (Dynamic Time Warping) directly.

However, if you map the text items to numeric numbers, assuming you know all the words that occur in all the text sequences, you can use DTW at the TSD package with timeid = obs.

The longest common subsequence example at the TSD package uses proc format for a similar problem.

https://go.documentation.sas.com/doc/en/pgmsascdc/v_016/castsp/castsp_tsd_sect084.htm

Note: For the TSD (Time Series Distance Measure) Package, you need a Visual Forecasting license in SAS VIYA 3.5+.

TSD contains SAX and DTW.

Visual Forecasting also offers the PROC TSMODEL.

Good luck,

Koen

Enterprise Miner Association Node with TS Similarity

Re: Enterprise Miner Association Node with TS Similarity

Re: Enterprise Miner Association Node with TS Similarity

Re: Enterprise Miner Association Node with TS Similarity

Re: Enterprise Miner Association Node with TS Similarity