BookmarkSubscribeRSS Feed
chuie
Quartz | Level 8

Hi There,

 

I found this diagram and article  where they did a statistical modeling and figure out the high risk group and then did a unsupervised clustering.

So  I am not sure what is the point of doing unsupervised clustering as  we already know what are the features(variables importance, nodes etc)  that high  risk group entails thru the statistical modeling.

This is a great article but couldn't understand the logic behind it .

Please help

Thanks

C

https://yougottabelieve.info/case-control-study-vs-cohort-study-retrospective 

 

 

IclusterPNG.PNG found this article/a 

4 REPLIES 4
Reeza
Super User
Link doesn't work. From the diagram it looks like the High Risk group was used for the unsupervised clustering - and this is usually done to tell us what we don't know. Yes, we know some variable importance, but exactly how that falls out for this subgroup may be different. Unsupervised clustering may be counter to what we expect so it's a good step to go through to either confirm or reject assumptions.
chuie
Quartz | Level 8

 

For some reason the article doesn't work. However I have pasted the unsupervised clustering section below. I still do not get what is its purpose as the  PCA and relevant variables were already achieved for these high risk group from the decision tree/ variable of importance chart..

 

 

Unsupervised Clustering: Subgroup Analysis of
High-Risk Patients
We used principle component analysis [21] to reduce high
dimensional EMR features and identify clinically relevant
groups of patients of high risk for 6-month ED visit with similar
patterns of demographics, primary diagnosis and procedure,
and chronic disease conditions. The features for high-risk
patients were projected to a lower dimensional subspace with
largest variances. The K-means algorithm was applied to find
potential patient patterns for future 6-month ED visit [22]. We
used K=6 to generate the final six clusters. The technical details
are described in Multimedia Appendix 9. Clustering patterns
between retrospective and prospective cohorts were compared
to further validate our high-risk case finding algorithm. As part
of the health care management platform, our predictive model
was integrated onto a Web-based dashboard to provide a
real-time visualization of the population profile with ED
6-month visits.

Reeza
Super User
>The technical details
are described in Multimedia Appendix 9
Do you have access to that?
chuie
Quartz | Level 8

it just explain how to do it not why 🙂

 

********************************************

Multimedia Appendix 9. Unsupervised clustering of high r isk population using
PCA.
To reduce high dimensional EMR features for detecting cohort pat tern, we used
principle component analysis (PCA) to divide the high r isk patients of future 6-
month ED visit identified by our algorithm in the prospective cohort into distinctive
groups, based on demographics, primary diagnosis and procedure, and chronic
disease conditions. The features for high-r isk patients are projected to a lower
dimensional subspace with largest variances.
Where Xi is EMR feature mat rix for each high-r isk patient, and wk is the set of
vectors of weights that map each patient feature vector Xi to a new vector of
principal component scores Ti
k. And we computed w1 by solving following objective
functions (1) and (2) and wk by i terating objective function (3) based on the first k-1
principal components,
And then K-means algorithm was applied on the top of principal components Ti
k
subspace of PCA to find potential patient patterns for future 6-month ED visit. We
used K=6 to implement init ial k means set for the algorithm and calculate the
Euclidean centroid m to generate finial clusters,
Where Ci is the ith cluster in total 6 clusters, and x represents the previous principal
components Tk.
Unique patterns revealed by the clustering results were analyzed to characterize
the high-r isk subjects identified by our ED algorithm. Unique patterns revealed by
the clustering results were analyzed to characterize the high-r isk subjects identified
by our ED algorithm.
1

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1157 views
  • 1 like
  • 2 in conversation