Solved: Classifying emails with SAS VTA and VDMML

thistleandtweed · Posted 05-30-2024 01:56 AM

Hi everyone!

I'm currently learning SAS programming, and I wanted to embark on my own project for now. I have access to SAS Viya, so I was thinking of conducting unsupervised classification of emails (multi-class classification) through VDMML and VTA.

I was thinking of running the text through VTA and then extracting the score code from the categories node, and then process this data to use in VDMML to train a classification model. However, I'm not sure what kind of pipeline would be suitable for this approach as most of the current pipelines seem catered towards supervised learning.

Any help in this area would be appreciated. Apologies if this is a very basic question, and thank

sbxkoenk · Posted 06-03-2024 07:21 AM

I'm not so sure this is the best little project to learn SAS programming ... but anyway.

In SAS terminology multi-class classification (and multi-label classification) are always supervised.

You probably need unsupervised learning clustering classifiers or topic detection capabilities.

If there's no pipeline template for clustering in Model Studio (VDMML), you can always build such a pipeline yourself starting from a data node (or an empty pipeline).

After you have used Singular Value Decomposition (SVD) or Latent Dirichlet allocation (LDA) to reduce the dimensionality of the weighted term-by-document frequency matrix, you can perfectly apply some clustering algorithms. But every e-mail will belong to only 1 cluster. If you use the topic detection in VTA, then a single e-mail may contain several topics.

Koen

View solution in original post

sbxkoenk · Posted 06-03-2024 07:21 AM

I'm not so sure this is the best little project to learn SAS programming ... but anyway.

In SAS terminology multi-class classification (and multi-label classification) are always supervised.

You probably need unsupervised learning clustering classifiers or topic detection capabilities.

If there's no pipeline template for clustering in Model Studio (VDMML), you can always build such a pipeline yourself starting from a data node (or an empty pipeline).

After you have used Singular Value Decomposition (SVD) or Latent Dirichlet allocation (LDA) to reduce the dimensionality of the weighted term-by-document frequency matrix, you can perfectly apply some clustering algorithms. But every e-mail will belong to only 1 cluster. If you use the topic detection in VTA, then a single e-mail may contain several topics.

Koen

Classifying emails with SAS VTA and VDMML

Re: Classifying emails with SAS VTA and VDMML

Re: Classifying emails with SAS VTA and VDMML

Catch up on SAS Innovate 2026