Classifying emails with SAS VTA and VDMML

thistleandtweed — Thu, 30 May 2024 08:04:45 GMT

Hi everyone!

I'm currently learning SAS programming, and I wanted to embark on my own project for now. I have access to SAS Viya, so I was thinking of conducting unsupervised classification of emails (multi-class classification) through VDMML and VTA.

I was thinking of running the text through VTA and then extracting the score code from the categories node, and then process this data to use in VDMML to train a classification model. However, I'm not sure what kind of pipeline would be suitable for this approach as most of the current pipelines seem catered towards supervised learning.

Any help in this area would be appreciated. Apologies if this is a very basic question, and thank

Re: Classifying emails with SAS VTA and VDMML

sbxkoenk — Mon, 03 Jun 2024 11:21:28 GMT

I'm not so sure this is the best little project to learn SAS programming ... but anyway.

In SAS terminology multi-class classification (and multi-label classification) are always supervised.

You probably need unsupervised learning clustering classifiers or topic detection capabilities.

If there's no pipeline template for clustering in Model Studio (VDMML), you can always build such a pipeline yourself starting from a data node (or an empty pipeline).

After you have used Singular Value Decomposition (SVD) or Latent Dirichlet allocation (LDA) to reduce the dimensionality of the weighted term-by-document frequency matrix, you can perfectly apply some clustering algorithms. But every e-mail will belong to only 1 cluster. If you use the topic detection in VTA, then a single e-mail may contain several topics.

Koen

topic Re: Classifying emails with SAS VTA and VDMML in SAS Data Science

Classifying emails with SAS VTA and VDMML

Re: Classifying emails with SAS VTA and VDMML