BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
thistleandtweed
Fluorite | Level 6

Hi everyone!

 

I'm currently learning SAS programming, and I wanted to embark on my own project for now. I have access to SAS Viya, so I was thinking of conducting unsupervised classification of emails (multi-class classification) through VDMML and VTA. 

 

I was thinking of running the text through VTA and then extracting the score code from the categories node, and then process this data to use in VDMML to train a classification model. However, I'm not sure what kind of pipeline would be suitable for this approach as most of the current pipelines seem catered towards supervised learning. 

 

Any help in this area would be appreciated. Apologies if this is a very basic question, and thank

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

I'm not so sure this is the best little project to learn SAS programming ... but anyway.

 

In SAS terminology multi-class classification (and multi-label classification) are always supervised.

You probably need unsupervised learning clustering classifiers or topic detection capabilities.

If there's no pipeline template for clustering in Model Studio (VDMML), you can always build such a pipeline yourself starting from a data node (or an empty pipeline).

 

After you have used Singular Value Decomposition (SVD) or Latent Dirichlet allocation (LDA) to reduce the dimensionality of the weighted term-by-document frequency matrix, you can perfectly apply some clustering algorithms. But every e-mail will belong to only 1 cluster. If you use the topic detection in VTA, then a single e-mail may contain several topics.

 

Koen

View solution in original post

1 REPLY 1
sbxkoenk
SAS Super FREQ

I'm not so sure this is the best little project to learn SAS programming ... but anyway.

 

In SAS terminology multi-class classification (and multi-label classification) are always supervised.

You probably need unsupervised learning clustering classifiers or topic detection capabilities.

If there's no pipeline template for clustering in Model Studio (VDMML), you can always build such a pipeline yourself starting from a data node (or an empty pipeline).

 

After you have used Singular Value Decomposition (SVD) or Latent Dirichlet allocation (LDA) to reduce the dimensionality of the weighted term-by-document frequency matrix, you can perfectly apply some clustering algorithms. But every e-mail will belong to only 1 cluster. If you use the topic detection in VTA, then a single e-mail may contain several topics.

 

Koen

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 379 views
  • 1 like
  • 2 in conversation